Operator variables in memory scoped by key

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Operator variables in memory scoped by key

gerardg
I'm using a trie tree to match prefixes efficiently in an operator applied to
a KeyedStream. It can grow quite large so I'd like to store the contents of
the tree in a keyed MapState so I benefit from incremental checkpoints.
Then, I'd just need to recreate the tree in memory from the MapState in case
of failure.

What I don't clearly see is how to have a tree per key without using a keyed
ValueState. I guess that just using an instance variable is not recommended?
or one operator is instantiated per each of the keys?

Basically, I'd need a way to have variables in memory scoped by key and not
managed as state by flink.

Thanks,

Gerard




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Operator variables in memory scoped by key

Aljoscha Krettek
Hi,

I'm afraid that's not possible. One operator instance is in fact, as you suspected, responsible for processing elements with many different keys. Only if you use the keyed state abstractions (ValueState, MapState and such) is your state "per-key".

Best,
Aljoscha

> On 31. Aug 2017, at 15:12, gerardg <[hidden email]> wrote:
>
> I'm using a trie tree to match prefixes efficiently in an operator applied to
> a KeyedStream. It can grow quite large so I'd like to store the contents of
> the tree in a keyed MapState so I benefit from incremental checkpoints.
> Then, I'd just need to recreate the tree in memory from the MapState in case
> of failure.
>
> What I don't clearly see is how to have a tree per key without using a keyed
> ValueState. I guess that just using an instance variable is not recommended?
> or one operator is instantiated per each of the keys?
>
> Basically, I'd need a way to have variables in memory scoped by key and not
> managed as state by flink.
>
> Thanks,
>
> Gerard
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Operator variables in memory scoped by key

gerardg
Thanks Aljoscha,

So I can think of three possible solutions:

* Use an instance dictionary to store the trie tree scoped by the same key
that the KeyedStream. That should work if the lifetime of the operator
object is tied to the keys that it processes.
* Store the trie tree in a ValueState but somehow tell flink to not
checkpoint this state. Also, it would be better if this could be in a
MemoryStateBackend and the rest of the application state in a
RocksDBStateBackend but I think backends can't be mixed in the same job.
* Forget about the MapState and just store the tree in a ValueState and
somehow control how it is checkpointed so I can manually implement
incremental checkpoints. This seems kind of complex as I would need to keep
track of the insertions and deletions to the tree from the last checkpoint.

The first one is the most straightforward but I'm not sure if it is really
feasible without knowing flink internals. Also, if I rely in how Flink
internally manages objects and at some point it changes, it could introduce
a bug difficult to detect.

Could you provide some insight?

Thanks,

Gerard
 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/