keyBy using custom partitioner

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

keyBy using custom partitioner

madhu phatak
Hi,
How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition. 

--
Regards,
Madhukara Phatak
http://datamantra.io/
Reply | Threaded
Open this post in threaded view
|

Re: keyBy using custom partitioner

Stephan Ewen
Hi!

You can currently not override the hash function used by "keyBy()". The reason is that this function is used in multiple places, for the stream partitioning, and also for the partitioning of state. Both have to be aligned.

What you can do is use "partitionCustom(...)" to use an arbitrary partitioner. However, you cannot window or access state using that...

If you want to partition in a particular way and use windows after that, you would currently have to do something like a a map function that generates a special key, and then use keyBy() on that.

Greetings,
Stephan


On Wed, Mar 9, 2016 at 10:07 AM, madhu phatak <[hidden email]> wrote:
Hi,
How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition. 

--
Regards,
Madhukara Phatak
http://datamantra.io/

Reply | Threaded
Open this post in threaded view
|

Re: keyBy using custom partitioner

madhu phatak

Hi,
Thank you.

On Mar 9, 2016 5:27 PM, "Stephan Ewen" <[hidden email]> wrote:
Hi!

You can currently not override the hash function used by "keyBy()". The reason is that this function is used in multiple places, for the stream partitioning, and also for the partitioning of state. Both have to be aligned.

What you can do is use "partitionCustom(...)" to use an arbitrary partitioner. However, you cannot window or access state using that...

If you want to partition in a particular way and use windows after that, you would currently have to do something like a a map function that generates a special key, and then use keyBy() on that.

Greetings,
Stephan


On Wed, Mar 9, 2016 at 10:07 AM, madhu phatak <[hidden email]> wrote:
Hi,
How to use a custom partitioner in keyBy operation? As of now it's using hash partitioner to load balance across parallel tasks. I tried custom partitioning the schema before calling keyBy operation. It doesn't seem to preserve that partition. 

--
Regards,
Madhukara Phatak
http://datamantra.io/