About serialization - When and What

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

About serialization - When and What

madan
Hi,

I am trying to understand serialization part when a environment is executed. Taking a simple environment for ex., env.addSource(...).keyBy(...).sum(...).addSink(...).execute. I would like to understand when and where serialization happens and what are all serialized, operators,functions etc.,   

Can anyone please give some information or point me at proper documentation.

--
Thank you,
Madan.
Reply | Threaded
Open this post in threaded view
|

Re: About serialization - When and What

Timo Walther
Hi Madan,

serialization happens at different positions with different mechanisms.

For records that are travelling in the stream the serializer that is defined by the type information is used (print env.addSource().getType()). However, whether records are serialized or not depends on the so-called object reuse mode [0] and if repartitioning is involved. By default, records are serialized between all operators because Flink Functions could mutually modify objects if a user does not pay special attention to it. If you know what you are doing, you can enable object reuse. In this case, serialization happens only for repartitioning, e.g. always after a keyBy().

For keeping state (like in sum()), a record must be serialized during a checkpoint or for writing it to the RocksDB state backend.

Instances of Functions are serialized using Java serialization (to ship member variables etc.).

I hope we can improve this part in the documentation in the near future.

Regards,
Timo


[0] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/execution_configuration.html

Am 12/18/17 um 1:15 PM schrieb madan:
Hi,

I am trying to understand serialization part when a environment is executed. Taking a simple environment for ex., env.addSource(...).keyBy(...).sum(...).addSink(...).execute. I would like to understand when and where serialization happens and what are all serialized, operators,functions etc.,   

Can anyone please give some information or point me at proper documentation.

--
Thank you,
Madan.