How to use operator list state like a HashMap?

Posted by Tony Wei on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/How-to-use-operator-list-state-like-a-HashMap-tp15701.html

Hi,

I have a basic streaming job that continuously persist data from Kafka to S3.
Those data would be grouped by some dimensions and a limited amount.

Originally, I used 'keyBy' and key state to fulfill the requirement.
However, because the data is extremely skewed, I turned to use map function to aggregate data for some partitions only, so that I can balance the amount of data in each sub tasks.

I used a HashMap to store data by different dimensions inner map function and convert it to operator list state when 'snapshot()' is called.
But, that makes another problem. Because I can't access operator list state directly like using key state in KeyedStream, I have to use heap space to store those state. It leads to the limitation of the amount that I can cache in map function.

I was wondering if there is any good suggestion to deal with this problem or how to use operator list state like this scenario with a better manner. Thank you.


Best Regards,
Tony Wei