(DEPRECATED) Apache Flink User Mailing List archive.

keyBy using custom partitioner

Classic

List

Threaded

3 messages Options

madhu phatak

keyBy using custom partitioner

Hi,

How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition.

Regards,

Madhukara Phatak
http://datamantra.io/

Stephan Ewen

Re: keyBy using custom partitioner

Hi!

You can currently not override the hash function used by "keyBy()". The reason is that this function is used in multiple places, for the stream partitioning, and also for the partitioning of state. Both have to be aligned.

What you can do is use "partitionCustom(...)" to use an arbitrary partitioner. However, you cannot window or access state using that...

If you want to partition in a particular way and use windows after that, you would currently have to do something like a a map function that generates a special key, and then use keyBy() on that.

Greetings,

Stephan

On Wed, Mar 9, 2016 at 10:07 AM, madhu phatak <[hidden email]> wrote:

Hi,
How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition.

--
Regards,
Madhukara Phatak
http://datamantra.io/

madhu phatak

Re: keyBy using custom partitioner

Hi,
Thank you.

On Mar 9, 2016 5:27 PM, "Stephan Ewen" <[hidden email]> wrote:

Hi!

You can currently not override the hash function used by "keyBy()". The reason is that this function is used in multiple places, for the stream partitioning, and also for the partitioning of state. Both have to be aligned.

What you can do is use "partitionCustom(...)" to use an arbitrary partitioner. However, you cannot window or access state using that...

If you want to partition in a particular way and use windows after that, you would currently have to do something like a a map function that generates a special key, and then use keyBy() on that.

Greetings,
Stephan

On Wed, Mar 9, 2016 at 10:07 AM, madhu phatak <[hidden email]> wrote:
Hi,
How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition.

--
Regards,
Madhukara Phatak
http://datamantra.io/