Parallelism of Keyed Process Function

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallelism of Keyed Process Function

parti
Hi,

Here is a question related to parallelism of keyed-process-function that is applied to the KeyedStream. For some code that looks like this
myStream.keyBy(...)
    .process(new MyKeyedProcessFunction())
    .process(<someOtherProcessFunction>).setParallelism(10)

On a Flink cluster with 5 TM nodes each with 10 task slots, and Job parallelism = 5, the 5 subtasks of MyKeyedProcessFunction() do not get distributed across all the 5 TM nodes evenly. Those typically get assigned to one single TM node. However the 50 subtasks of <someOtherProcessFunction> are always spread evenly across the 5 TMs.

Am I missing something? How can I get those MyKeyedProcessFunction() subtasks distributed across all TM nodes evenly?

Thanks 
Arti
Reply | Threaded
Open this post in threaded view
|

Re: Parallelism of Keyed Process Function

Arvid Heise-3
Hi Arti,

This is nothing specific to KeyedProcessFunction, but the general way Flink distributes subtasks. The general idea is to use as few task managers as possible such that they are available for cluster downsizing or other concurrent jobs.

You can change this behavior through cluster.evenly-spread-out-slots configuration [1].




On Mon, Sep 14, 2020 at 5:33 PM Arti Pande <[hidden email]> wrote:
Hi,

Here is a question related to parallelism of keyed-process-function that is applied to the KeyedStream. For some code that looks like this
myStream.keyBy(...)
    .process(new MyKeyedProcessFunction())
    .process(<someOtherProcessFunction>).setParallelism(10)

On a Flink cluster with 5 TM nodes each with 10 task slots, and Job parallelism = 5, the 5 subtasks of MyKeyedProcessFunction() do not get distributed across all the 5 TM nodes evenly. Those typically get assigned to one single TM node. However the 50 subtasks of <someOtherProcessFunction> are always spread evenly across the 5 TMs.

Am I missing something? How can I get those MyKeyedProcessFunction() subtasks distributed across all TM nodes evenly?

Thanks 
Arti


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng