Is it possible to emulate keyed state with operator state?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to emulate keyed state with operator state?

Salva Alcántara
Just for the sake of experimenting and learning. Let's assume that we have a
keyed process function using keyed state and we want to rewrite it using
operator state. The question is, would that be possible to keep the exact
same behaviour? For example, one could use operator union list state and
then setup a timer to automatically remove the state not used within a given
time...that would probably work but I'd rather prefer a way to know which
elements of the union list state to use right after a recovery/restore,
discarding the others, depending on the set of keys the current operator
instance has been assigned. Is it possible to achieve this?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to emulate keyed state with operator state?

David Anderson-2

Hypothetically, yes, I think this is possible to some extent. You would have to give up all the things that require a KeyedStream, such as timers, and the RocksDB state backend. And performance would suffer.

As for the question of determining which key groups (and ultimately, which keys) are handled by a specific instance, see [1] and [2].

[1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-can-I-find-out-which-key-group-belongs-to-which-subtask-td32032.html
[2] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Use-keyBy-to-deterministically-hash-each-record-to-a-processor-task-slot-td16483.html


On Wed, Apr 8, 2020 at 9:10 AM Salva Alcántara <[hidden email]> wrote:

>
> Just for the sake of experimenting and learning. Let's assume that we have a
> keyed process function using keyed state and we want to rewrite it using
> operator state. The question is, would that be possible to keep the exact
> same behaviour? For example, one could use operator union list state and
> then setup a timer to automatically remove the state not used within a given
> time...that would probably work but I'd rather prefer a way to know which
> elements of the union list state to use right after a recovery/restore,
> discarding the others, depending on the set of keys the current operator
> instance has been assigned. Is it possible to achieve this?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/