Re: load balancing of keys to operators

Posted by Timo Walther on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/load-balancing-of-keys-to-operators-tp12303p12306.html

Hi,

using keyBy Flink ensures that every set of records with same key is
send to the same operator, otherwise it would not be possible to process
them as a whole. It depends on your use case if it is also ok that
another operator processes parts of this set of records. You can
implement you own partition strategy to split your data more evenly
(https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#physical-partitioning).
But this depends on your knowledge of key spaces and load, Flink can not
know this in advance.

I hope that helps.

Regards,
Timo


Am 20/03/17 um 15:29 schrieb Sonex:

> I am using a simple streaming job where I use keyBy on the stream to process
> events per key. The keys may vary in number (few keys to thousands). I have
> noticed a behavior of Flink and I need clarification on that. When we use
> keyBy on the stream, flink assigns keys to parallel operators so each
> operator can handle events per key independently. Once a key is assigned to
> an operator, can the key change the operator on which it is assigned? From
> what I`ve seen the answer is no.
>
> For example, let`s assume that keys 1 and 2 are assigned to operator A and
> keys 3 and 4 are assigned to operator B. If there is a burst of data for key
> 1 at some later time point, but keys 2,3 and 4 have only few data will key 2
> be assigned to operator B to balance the load? If not is there a way to do
> that? And again if not, why flink does not do that?
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/load-balancing-of-keys-to-operators-tp12303.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.