Hi,
What I want : I have my own partitioning technique that generates keys for DataStream tuples ,those keys range are equal to the number of nodes in the clusters like if I set the parallelism equal to 4 the generated keys will be 0,1,2 and 3 and so on and then
every key should be partitioned to the same node to do such more keyed processing using keyed state.
What happened: I have implemented my logic using the keyBy so I can use a keyed state but it suffers from a great skewness some of the nodes had received no records and other ones received more than one. I have tried to use custom partitioning it did the physical
partitioning as I want but I can not use the keyed state with it without using keyBy.
What I expect (questions): Is there a way to control the skewness or enforce keys to be parallelized over the available nodes? or Is there a way to overwrite the partitioning technique used in keyBy? or Is there a way to use a keyed state with custom partitioning?
Best Regards
Heidy Hazem
Heidy
Hazem– Teaching assistant, School of Information Technology and Computer Science (Formerly, CIT, Communication Engineering, and Information Technology
School)
26th July Corridor, Sheikh Zayed, Giza, Egypt |
Hi Have you tried the key selector function[1]? Heidi Hazem Mohamed <[hidden email]> 于2019年10月27日周日 下午11:04写道:
|
Free forum by Nabble | Edit this page |