Question about serialization and performance
Posted by Michael Latta on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Question-about-serialization-and-performance-tp24219.html
In running tests of flink jobs we are seeing some that yield really good performance (2.5M records in minutes) and others that are struggleing to get past 200k records processed. In the later case there are a large number of keys, and each key gets state in the form of 3 value states. One holds a string and the others hold a Map of Lists of events (JSONObject object subclasses with custom java serialization). There is also a MapState for each key that will hold one entry for each event matching that key (string->string).
The program starts out processing 1000-1500 records/sec (on my 4 year old laptop), and progressively gets slower and slower. it is about 400/sec when processing the 500,000th event.
When using JProfiler on the test (local environment running under Eclipse) it indicates 70-80% of the execution time is spent in Kryo serialization methods.
When using the MemoryStateBackend the above is true, the RocksDBStateBackend is about 1/2 to 2/3 the speed.
Any suggestions on how to reduce or identify the source of the serialization performance issue is welcome.
Michael