Flink Serialization and case class fields limit

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Serialization and case class fields limit

Andrea Sella-2
Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type system".

I have a generated case class using scalapb that has more than 30 fields; I've seen that Flink still uses the CaseClassSerializer, the TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written differently (22 fields limit). I'd have expected a GenericTypeInfo, but all is well because the CaseClassSerializer is faster than Kryo. Did I misunderstand the documentation or don't the limitation apply anymore? 

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I would like to replicate the experiment with some tailored changes to deep dive even better in the topic. Is the source code in Github or somewhere else?


Thank you,
Andrea

Reply | Threaded
Open this post in threaded view
|

Re: Flink Serialization and case class fields limit

Andrey Zagrebin
Hi Andrea,

22 limit comes from Scala [1], not Flink.
I am not sure about any repo for the post, but I also cc'ed Fabian, maybe he will point to some if it exists.

Best,
Andrey

[1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html

On 16 Nov 2018, at 13:10, Andrea Sella <[hidden email]> wrote:

Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type system".

I have a generated case class using scalapb that has more than 30 fields; I've seen that Flink still uses the CaseClassSerializer, the TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written differently (22 fields limit). I'd have expected a GenericTypeInfo, but all is well because the CaseClassSerializer is faster than Kryo. Did I misunderstand the documentation or don't the limitation apply anymore? 

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I would like to replicate the experiment with some tailored changes to deep dive even better in the topic. Is the source code in Github or somewhere else?


Thank you,
Andrea


Reply | Threaded
Open this post in threaded view
|

Re: Flink Serialization and case class fields limit

Andrea Sella-2
Hi Andrey,

My bad, I forgot to say that I am using Scala 2.11, that’s why I asked about the limitation, and Flink 1.5.5.

If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely on unapply and tupled functions, so I'd say that Flink doesn't have this kind of limitation with Scala 2.11. Correct?

Thank you,
Andrea
On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin <[hidden email]> wrote:
Hi Andrea,

22 limit comes from Scala [1], not Flink.
I am not sure about any repo for the post, but I also cc'ed Fabian, maybe he will point to some if it exists.

Best,
Andrey



On 16 Nov 2018, at 13:10, Andrea Sella <[hidden email]> wrote:

Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type system".

I have a generated case class using scalapb that has more than 30 fields; I've seen that Flink still uses the CaseClassSerializer, the TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written differently (22 fields limit). I'd have expected a GenericTypeInfo, but all is well because the CaseClassSerializer is faster than Kryo. Did I misunderstand the documentation or don't the limitation apply anymore? 

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I would like to replicate the experiment with some tailored changes to deep dive even better in the topic. Is the source code in Github or somewhere else?


Thank you,
Andrea


Reply | Threaded
Open this post in threaded view
|

Re: Flink Serialization and case class fields limit

Fabian Hueske-2
Hi Andrea,

I wrote the post 2.5 years ago. Sorry, I don't think that I kept the code somewhere, but the mechanics in Flink should still be the same. 

Best, Fabian 

Am Fr., 16. Nov. 2018, 20:06 hat Andrea Sella <[hidden email]> geschrieben:
Hi Andrey,

My bad, I forgot to say that I am using Scala 2.11, that’s why I asked about the limitation, and Flink 1.5.5.

If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely on unapply and tupled functions, so I'd say that Flink doesn't have this kind of limitation with Scala 2.11. Correct?

Thank you,
Andrea
On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin <[hidden email]> wrote:
Hi Andrea,

22 limit comes from Scala [1], not Flink.
I am not sure about any repo for the post, but I also cc'ed Fabian, maybe he will point to some if it exists.

Best,
Andrey



On 16 Nov 2018, at 13:10, Andrea Sella <[hidden email]> wrote:

Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type system".

I have a generated case class using scalapb that has more than 30 fields; I've seen that Flink still uses the CaseClassSerializer, the TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written differently (22 fields limit). I'd have expected a GenericTypeInfo, but all is well because the CaseClassSerializer is faster than Kryo. Did I misunderstand the documentation or don't the limitation apply anymore? 

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I would like to replicate the experiment with some tailored changes to deep dive even better in the topic. Is the source code in Github or somewhere else?


Thank you,
Andrea