Hi,
I would to ask about what has more memory footprint and what could be more efficient regarding less keys with bigger keyState vs many keys with smaller keyState For this use case I'm using RocksDB StateBackend and state TTL is, well.. infinitive. So I'm keeping the state forever in Flink. The use case: I have a stream of messages that I have to process it in some custom way. I can take one of two approaches 1. use a keyBy that will give me some number of distinct keys but for each key, the state size will be significant. It will be MapState in this case. The keyBy I used will still give me ability to spread operations across operator instances. 2. In second approach I can use a different keyBy, where I would have huge number of distinct keys, but each keyState will be very small and it will be a ValueState in this case. To sum up: "reasonable" number of keys with very big keySatte VS huge number of keys with very small state each. What are the pros and cons for both? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi I'll give some information from my side: 1. The performance for RocksDB is mainly related to the (de)serialization and disk read/write. 2. MapState just need to (de)serialize the single mapkey/mapvalue when read/write state, ValueState need to (de)serialize the whole state when read/write the state 3. disk read/write is somewhat about the whole state size Best, Congxian KristoffSC <[hidden email]> 于2020年4月8日周三 上午2:41写道: Hi, |
Thanks Congxian Qiu,
I'm aware about your second point. In Value state I will keep String or very simple POJO, without any collections inside. I didn't get your third point, could you clarify it please? "disk read/write is somewhat about the whole state size" Actually what I will keep in Value state is what it would be kept in single MapState entry. Depends what key I will choose, my state can be "broader" where I will use MapState, or can be very narrow so I will be able to use Value state that will keep actually only one entry. This is the essence of my question , what are the trade offs here. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi In the last email, I just wanted to express that the overall state size(and the access pattern, but I assume that the access pattern is the same between the two states) affects the final performance (which has to do with RocksDB's architecture), and if you use MapState and ValueState to end up with about the same state size on each subtask, then there is no difference at this point Best, Congxian KristoffSC <[hidden email]> 于2020年4月8日周三 下午3:36写道: Thanks Congxian Qiu, |
Free forum by Nabble | Edit this page |