Question about parallelism
Posted by Jerry Peng on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Question-about-parallelism-tp15023.html
Hello all,
I have a question about parallelism and partitioning in the
DataStreams API. In Flink, a user can the parallelism of a data
source as well as operators. So when I set the parallelism of a data
source e.g.
DataStream<String> text =
env.readTextFile(params.get("input")).setParallelism(5)
does this mean that the resulting "text" DataStream in going to be
partitioned into 5 partitions or does it mean that there are going to
be 5 parallel tasks that are going to run for this stage?
If the next operator is:
DataStream<Tuple2<String, Integer>> counts = text.flatMap(new
Tokenizer()).setParallelism(10)
and the parallelism is set to 10. Are there 10 parallel tasks
consuming from the 5 partitions? and how is the resulting "counts"
DataStream partitioned? into 10 partitions?
Thanks in advance!
Best,
Jerry