Re: Collections within POJOs/tuples
Posted by
Stephan Ewen on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Collections-within-POJOs-tuples-tp1066p1082.html
Here are some rough cornerpoints for serialization efficiency in Flink:
- Tuples are a bit more efficient than POJOs, because they do not support (and encode) possible subclasses and they do not involve and reflection code at all.
- Arrays are more efficient than collections (collections go in Scala through the collection serializer, sometimes through Kryo, arrays are handles directly)
- The efficiency of a "Tuple1<Type>" is virtually the same as that of "Type", since the Tuple is never serialized, it is just in the TypeInformation metadata and not in the serialized data
- Arrays or lists as a top level element are good. Don't pout them into a POJO or a tuple unless you need to add also other fields.
Let us know if you have more questions.