I'm using a trie tree to match prefixes efficiently in an operator applied to
a KeyedStream. It can grow quite large so I'd like to store the contents of the tree in a keyed MapState so I benefit from incremental checkpoints. Then, I'd just need to recreate the tree in memory from the MapState in case of failure. What I don't clearly see is how to have a tree per key without using a keyed ValueState. I guess that just using an instance variable is not recommended? or one operator is instantiated per each of the keys? Basically, I'd need a way to have variables in memory scoped by key and not managed as state by flink. Thanks, Gerard -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi,
I'm afraid that's not possible. One operator instance is in fact, as you suspected, responsible for processing elements with many different keys. Only if you use the keyed state abstractions (ValueState, MapState and such) is your state "per-key". Best, Aljoscha > On 31. Aug 2017, at 15:12, gerardg <[hidden email]> wrote: > > I'm using a trie tree to match prefixes efficiently in an operator applied to > a KeyedStream. It can grow quite large so I'd like to store the contents of > the tree in a keyed MapState so I benefit from incremental checkpoints. > Then, I'd just need to recreate the tree in memory from the MapState in case > of failure. > > What I don't clearly see is how to have a tree per key without using a keyed > ValueState. I guess that just using an instance variable is not recommended? > or one operator is instantiated per each of the keys? > > Basically, I'd need a way to have variables in memory scoped by key and not > managed as state by flink. > > Thanks, > > Gerard > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Thanks Aljoscha,
So I can think of three possible solutions: * Use an instance dictionary to store the trie tree scoped by the same key that the KeyedStream. That should work if the lifetime of the operator object is tied to the keys that it processes. * Store the trie tree in a ValueState but somehow tell flink to not checkpoint this state. Also, it would be better if this could be in a MemoryStateBackend and the rest of the application state in a RocksDBStateBackend but I think backends can't be mixed in the same job. * Forget about the MapState and just store the tree in a ValueState and somehow control how it is checkpointed so I can manually implement incremental checkpoints. This seems kind of complex as I would need to keep track of the insertions and deletions to the tree from the last checkpoint. The first one is the most straightforward but I'm not sure if it is really feasible without knowing flink internals. Also, if I rely in how Flink internally manages objects and at some point it changes, it could introduce a bug difficult to detect. Could you provide some insight? Thanks, Gerard -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |