Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )
Posted by
Stephan Ewen on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Efficiency-for-Filter-then-Transform-filter-map-vs-flatMap-tp2644p2665.html
In a set of benchmarks a while back, we found that the chaining mechanism has some overhead right now, because of its abstraction. The abstraction creates iterators for each element and makes it hard for the JIT to specialize on the operators in the chain.
For purely local chains at full speed, this overhead is observable (can decrease throughput from 25mio elements/core to 15-20mio elements per core). If your job does not reach that throughput, or is I/O bound, source bound, etc, it does not matter.
If you care about super high performance, collapsing the code into one function helps.