Is it possible to configure Flink pre-flight type serialization scanning?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to configure Flink pre-flight type serialization scanning?

John Tipper

Flink performs significant scanning during the pre-flight phase of a Flink application (https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html). The act of creating sources, operators and sinks causes Flink to scan the data types of the objects that are used within the topology of a given streaming flow as apparently Flink will try to optimise jobs based on this information.

Is this scanning configurable? Can I turn it off and just force Flink to use Kryo serialisation only and not need or use any of this scanned information?

I have a very large, deeply nested class in a proprietary library that was auto generated and Flink seems to get into a very large endless loop when scanning it that results in out of memory errors after running for several hours (the application never actually launches via env.execute(), even if I bump up the heap size significantly). The class has a number of circular references, i.e. class and its child classes contains references to other classes of the same type, is this likely to be a problem?


Many thanks,

John