Hi all,
When I use out.collect() twice inside a faltMap, the output is sometimes and randomly skewed. Take this example: final StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(); env.generateSequence(1, 100000) .flatMap((Long t, Collector<String> out) -> { out.collect("line1"); out.collect("line2"); }) .writeAsText("test",FileSystem.WriteMode.OVERWRITE).setParallelism(1); env.execute("Test"); I expect the output to be line1 line2 line1 line2 ... But some resulting lines (18 out of 200000) were: line2 line2 and the same for line1. What could be the reason for this? Best, Yassine |
Hi! The source runs parallel (n tasks), but the sink has a parallelism of 1. The sink hence has to merge the parallel streams from the source, which happens based on arrival speed of the streams, i.e., its not deterministic. That's why you see the lines being mixed. Try running source and sink with the same parallelism, then no merge of streams needs to happen. You'll see then that per output file, the lines are correct. Stephan On Thu, Aug 11, 2016 at 2:29 PM, Yassin Marzouki <[hidden email]> wrote:
|
Indeed, using the same parallelism corrected the output. Thank you! On Thu, Aug 11, 2016 at 2:34 PM, Stephan Ewen <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |