Hi Madan,
serialization happens at different positions with different
mechanisms.
For records that are travelling in the stream the serializer that
is defined by the type information is used (print
env.addSource().getType()). However, whether records are
serialized or not depends on the so-called object reuse mode [0]
and if repartitioning is involved. By default, records are
serialized between all operators because Flink Functions could
mutually modify objects if a user does not pay special attention
to it. If you know what you are doing, you can enable object
reuse. In this case, serialization happens only for
repartitioning, e.g. always after a keyBy().
For keeping state (like in sum()), a record must be serialized
during a checkpoint or for writing it to the RocksDB state
backend.
Instances of Functions are serialized using Java serialization (to
ship member variables etc.).
I hope we can improve this part in the documentation in the near
future.
Regards,
Timo
[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/execution_configuration.html
Am 12/18/17 um 1:15 PM schrieb madan:
Hi,
I am trying to understand serialization part when a
environment is executed. Taking a simple environment for ex.,
env.addSource(...).keyBy(...).sum(...).addSink(...).execute. I
would like to understand when and where serialization happens
and what are all serialized, operators,functions etc.,
Can anyone please give some information or point me at
proper documentation.
--