On Fri, Feb 3, 2017 at 2:09 AM, Mohit Anchlia <
[hidden email]> wrote:
> What is the granularity of parallelism in flink? For eg: if I am reading
> from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2
> consumer threads and allocates it on 2 separate task managers?
Yes, this will instantiate two instances of the Kafka source, the map
operator, and the sink. These parallel sub pipelines will be scheduled
to separate slots (that might happen to on the same TM). See here for
more details:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.htmlPartitioning of state happens in key groups, which define a range of
keys. A single subtask is usually responsible for more than a single
key group. The key groups are the units of rescaling your program.
This is configurable via the setMaxParallelism() method on the
environment.