(DEPRECATED) Apache Flink User Mailing List archive.

Parallelism and Partitioning

Classic

List

Threaded

3 messages Options

Mohit Anchlia

Parallelism and Partitioning

What is the granularity of parallelism in flink? For eg: if I am reading from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2 consumer threads and allocates it on 2 separate task managers?

Also, it would be good to understand the difference between parallelism and partitioning that also could be distributed across task managers.

Mohit Anchlia

Re: Parallelism and Partitioning

Any information on this would be helpful.

On Thu, Feb 2, 2017 at 5:09 PM, Mohit Anchlia <[hidden email]> wrote:

What is the granularity of parallelism in flink? For eg: if I am reading from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2 consumer threads and allocates it on 2 separate task managers?

Also, it would be good to understand the difference between parallelism and partitioning that also could be distributed across task managers.

Ufuk Celebi

Re: Parallelism and Partitioning

In reply to this post by Mohit Anchlia

On Fri, Feb 3, 2017 at 2:09 AM, Mohit Anchlia <[hidden email]> wrote:
> What is the granularity of parallelism in flink? For eg: if I am reading
> from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
> consumer threads and allocates it on 2 separate task managers?

Yes, this will instantiate two instances of the Kafka source, the map
operator, and the sink. These parallel sub pipelines will be scheduled
to separate slots (that might happen to on the same TM). See here for
more details: https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html

Partitioning of state happens in key groups, which define a range of
keys. A single subtask is usually responsible for more than a single
key group. The key groups are the units of rescaling your program.
This is configurable via the setMaxParallelism() method on the
environment.