Serialization in Operator Chaining

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Serialization in Operator Chaining

Hicken, Jan
Hi folks,

I have a question regarding the serialization in Flink's operator
chaining:

Consider these two map functions: Map1<String, T> and Map2<T, String>

As I haven't disabled operator chaining in the environment, these two
functions will be chained into one operator when executing my job.

The thing is, that the serialization for objects of type T is quite
expensive and I'd like to avoid that as much as possible. Does Flink
actually serialize these objects under the hood even if the functions
run in the same operator? If so, is it possible to disable the
serialization somehow?

Kind regards,
Jan

signature.asc (549 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Serialization in Operator Chaining

Aljoscha Krettek
Hi,

If you use the DataSet API, there will be no serialisation between operations in a chain. If you use the DataStream API, there will be serialisation by default but you can disable that using executionEnv.getConfig().enableObjectReuse().

Hope that helps,
Aljoscha

> On 9. Nov 2017, at 13:57, Hicken, Jan <[hidden email]> wrote:
>
> Hi folks,
>
> I have a question regarding the serialization in Flink's operator
> chaining:
>
> Consider these two map functions: Map1<String, T> and Map2<T, String>
>
> As I haven't disabled operator chaining in the environment, these two
> functions will be chained into one operator when executing my job.
>
> The thing is, that the serialization for objects of type T is quite
> expensive and I'd like to avoid that as much as possible. Does Flink
> actually serialize these objects under the hood even if the functions
> run in the same operator? If so, is it possible to disable the
> serialization somehow?
>
> Kind regards,
> Jan