(DEPRECATED) Apache Flink User Mailing List archive.

Custom Partitioning with keyed state

Classic

List

Threaded

2 messages Options

Heidi Hazem Mohamed

Custom Partitioning with keyed state

Hi,

What I want : I have my own partitioning technique that generates keys for DataStream tuples ,those keys range are equal to the number of nodes in the clusters like if I set the parallelism equal to 4 the generated keys will be 0,1,2 and 3 and so on and then every key should be partitioned to the same node to do such more keyed processing using keyed state.

What happened: I have implemented my logic using the keyBy so I can use a keyed state but it suffers from a great skewness some of the nodes had received no records and other ones received more than one. I have tried to use custom partitioning it did the physical partitioning as I want but I can not use the keyed state with it without using keyBy.

What I expect (questions): Is there a way to control the skewness or enforce keys to be parallelized over the available nodes? or Is there a way to overwrite the partitioning technique used in keyBy? or Is there a way to use a keyed state with custom partitioning?

Best Regards

Heidy Hazem

Heidy Hazem– Teaching assistant, School of Information Technology and Computer Science (Formerly, CIT, Communication Engineering, and Information Technology School)
T: +201000 63-25-63 office: UB1-room 701

26^th July Corridor, Sheikh Zayed, Giza, Egypt
www.nu.edu.eg | www.facebook.com/NileUniversity

Congxian Qiu

Re: Custom Partitioning with keyed state

Have you tried the key selector function[1]?

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-key-selector-functions

Best,

Congxian

Heidi Hazem Mohamed <[hidden email]> 于2019年10月27日周日下午11:04写道：

Hi,

What I want : I have my own partitioning technique that generates keys for DataStream tuples ,those keys range are equal to the number of nodes in the clusters like if I set the parallelism equal to 4 the generated keys will be 0,1,2 and 3 and so on and then every key should be partitioned to the same node to do such more keyed processing using keyed state.

What happened: I have implemented my logic using the keyBy so I can use a keyed state but it suffers from a great skewness some of the nodes had received no records and other ones received more than one. I have tried to use custom partitioning it did the physical partitioning as I want but I can not use the keyed state with it without using keyBy.

What I expect (questions): Is there a way to control the skewness or enforce keys to be parallelized over the available nodes? or Is there a way to overwrite the partitioning technique used in keyBy? or Is there a way to use a keyed state with custom partitioning?

Best Regards

Heidy Hazem

Heidy Hazem– Teaching assistant, School of Information Technology and Computer Science (Formerly, CIT, Communication Engineering, and Information Technology School)
T: +201000 63-25-63 office: UB1-room 701

26^th July Corridor, Sheikh Zayed, Giza, Egypt
www.nu.edu.eg | www.facebook.com/NileUniversity