(DEPRECATED) Apache Flink User Mailing List archive.

Collections within POJOs/tuples

Classic

List

Threaded

2 messages Options

Kruse, Sebastian

Collections within POJOs/tuples

Hello everyone,

I was just wondering, which class would be most efficient to store collections of primitive elements, and which one to store objects, within POJOs and tuples from a serialization point of view. And would it make any difference if such a collection is not embedded within a POJO/tuple but is the "top-level element"?

Cheers,

Sebastian

Stephan Ewen

Re: Collections within POJOs/tuples

Here are some rough cornerpoints for serialization efficiency in Flink:

- Tuples are a bit more efficient than POJOs, because they do not support (and encode) possible subclasses and they do not involve and reflection code at all.

- Arrays are more efficient than collections (collections go in Scala through the collection serializer, sometimes through Kryo, arrays are handles directly)

- The efficiency of a "Tuple1<Type>" is virtually the same as that of "Type", since the Tuple is never serialized, it is just in the TypeInformation metadata and not in the serialized data

- Arrays or lists as a top level element are good. Don't pout them into a POJO or a tuple unless you need to add also other fields.

Let us know if you have more questions.

On Fri, Apr 17, 2015 at 8:54 AM, Kruse, Sebastian <[hidden email]> wrote:

Hello everyone,

I was just wondering, which class would be most efficient to store collections of primitive elements, and which one to store objects, within POJOs and tuples from a serialization point of view. And would it make any difference if such a collection is not embedded within a POJO/tuple but is the "top-level element"?

Cheers,

Sebastian