(DEPRECATED) Apache Flink User Mailing List archive.

Question about parallelism

Posted by Jerry Peng on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Question-about-parallelism-tp15023.html

Hello all,

I have a question about parallelism and partitioning in the
DataStreams API. In Flink, a user can the parallelism of a data
source as well as operators. So when I set the parallelism of a data
source e.g.

DataStream<String> text =
env.readTextFile(params.get("input")).setParallelism(5)

does this mean that the resulting "text" DataStream in going to be
partitioned into 5 partitions or does it mean that there are going to
be 5 parallel tasks that are going to run for this stage?

If the next operator is:

DataStream<Tuple2<String, Integer>> counts = text.flatMap(new
Tokenizer()).setParallelism(10)

and the parallelism is set to 10. Are there 10 parallel tasks
consuming from the 5 partitions? and how is the resulting "counts"
DataStream partitioned? into 10 partitions?

Thanks in advance!

Best,

Jerry