Flink groupBy

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink groupBy

Alieh
Hi All

Is there anyway in Flink to send a process to a reducer?

If I do "test.groupby(1).reduceGroup", each group is processed on one
reducer? And if the number of groups is more than the number of task
slots we have, does Flink distribute the process evenly? I mean if we
have for example groups of size 10, 5, 5 and we have two task slots, is
the process distributed in this way?

task slot1: group of size 10

task slot2: two groups of size 5

Best,

Alieh

Reply | Threaded
Open this post in threaded view
|

Re: Flink groupBy

Fabian Hueske-2
Hi Alieh,

Flink uses hash partitioning to assign grouping keys to parallel tasks by default.
You can implement a custom partitioner or use range partitioning (which has some overhead) to control the skew.

There is no automatic load balancing happening.

Best, Fabian

2017-04-19 14:42 GMT+02:00 Alieh <[hidden email]>:
Hi All

Is there anyway in Flink to send a process to a reducer?

If I do "test.groupby(1).reduceGroup", each group is processed on one reducer? And if the number of groups is more than the number of task slots we have, does Flink distribute the process evenly? I mean if we have for example groups of size 10, 5, 5 and we have two task slots, is the process distributed in this way?

task slot1: group of size 10

task slot2: two groups of size 5

Best,

Alieh