(DEPRECATED) Apache Flink User Mailing List archive.

Future of QueryableState

Classic

List

Threaded

4 messages Options

Maciek Próchniak

Future of QueryableState

Hello,

We are using QueryableState in some of Nussknacker deployments as a nice
addition, allowing end users to peek inside job state for a given key
(we mostly use custom operators).

Judging by mailing list and feature radar proposition by Stephan:
https://github.com/StephanEwen/flink-web/blob/feature_radar/img/flink_feature_radar.svg

this feature is not widely used/supported. I'd like to ask:

- are there any alternative ways of accessing state during job
execution? State API is very nice, but it operates on checkpoints and
loading whole state to lookup one key seems a bit heavy?

- are there any inherent problems in QueryableState design (e.g. it's
not feasible to use it in K8 settings, performance considerations) or
just lack of interest/support (in that case we may offer some help)?

thanks,

maciek

Konstantin Knauf-3

Re: Future of QueryableState

Hi Maciek,

Thank you for reaching out. I'll try to answer your questions separately.

- nothing comparable. You already mention the State Processor API. Besides that, I can only think of a side channel (CoFunction) that is used to request a certain state that is then send to a side output and ultimate to a sink, e.g. Kafka State Request Topic -> Flink -> Kafka State Response Topic. This puts this complexity into the Flink Job, though.

- I think it is a combination of both. Queryable State works well within its limitations. In the case of the RocksDBStatebackend this is mainly the availability of the job and the fact that you might read "uncommitted" state updates. In case of the heap-backed statebackends there are also synchronization issues, e.g. you might read stale values. You also mention the fact that queryable state has been an afterthought when it comes to more recent deployment options. I am not aware of any Committer who currently has the time to work on this to the degree that would be required. So, we thought, it would be more fair and realistic to mark Queryable State as "approaching end of life" in the sense that there is no active development on that component anymore.

Best,

Konstantin

On Tue, Mar 9, 2021 at 7:08 AM Maciek Próchniak <[hidden email]> wrote:

Hello,

We are using QueryableState in some of Nussknacker deployments as a nice
addition, allowing end users to peek inside job state for a given key
(we mostly use custom operators).

Judging by mailing list and feature radar proposition by Stephan:
https://github.com/StephanEwen/flink-web/blob/feature_radar/img/flink_feature_radar.svg

this feature is not widely used/supported. I'd like to ask:

- are there any alternative ways of accessing state during job
execution? State API is very nice, but it operates on checkpoints and
loading whole state to lookup one key seems a bit heavy?

- are there any inherent problems in QueryableState design (e.g. it's
not feasible to use it in K8 settings, performance considerations) or
just lack of interest/support (in that case we may offer some help)?

thanks,

maciek

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Maciek Próchniak

Re: Future of QueryableState

Hi Konstantin,

thanks for detailed answer. I also thought about CoFunction, but it is a bit too heavy for us for the moment (each state would have to have additional kafka producer/consumer).

Guess we'll use QueryableState for now and try to phase it out slowly...

thanks,

maciek

On 09.03.2021 17:42, Konstantin Knauf wrote:

Hi Maciek,

Thank you for reaching out. I'll try to answer your questions separately.

- nothing comparable. You already mention the State Processor API. Besides that, I can only think of a side channel (CoFunction) that is used to request a certain state that is then send to a side output and ultimate to a sink, e.g. Kafka State Request Topic -> Flink -> Kafka State Response Topic. This puts this complexity into the Flink Job, though.

- I think it is a combination of both. Queryable State works well within its limitations. In the case of the RocksDBStatebackend this is mainly the availability of the job and the fact that you might read "uncommitted" state updates. In case of the heap-backed statebackends there are also synchronization issues, e.g. you might read stale values. You also mention the fact that queryable state has been an afterthought when it comes to more recent deployment options. I am not aware of any Committer who currently has the time to work on this to the degree that would be required. So, we thought, it would be more fair and realistic to mark Queryable State as "approaching end of life" in the sense that there is no active development on that component anymore.

Best,

Konstantin

On Tue, Mar 9, 2021 at 7:08 AM Maciek Próchniak <[hidden email]> wrote:

Hello,

We are using QueryableState in some of Nussknacker deployments as a nice
addition, allowing end users to peek inside job state for a given key
(we mostly use custom operators).

Judging by mailing list and feature radar proposition by Stephan:
https://github.com/StephanEwen/flink-web/blob/feature_radar/img/flink_feature_radar.svg

this feature is not widely used/supported. I'd like to ask:

- are there any alternative ways of accessing state during job
execution? State API is very nice, but it operates on checkpoints and
loading whole state to lookup one key seems a bit heavy?

- are there any inherent problems in QueryableState design (e.g. it's
not feasible to use it in K8 settings, performance considerations) or
just lack of interest/support (in that case we may offer some help)?

thanks,

maciek

--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Arvid Heise-4

Re: Future of QueryableState

Hi Maciek,

Thanks for reaching out. Only through these interactions, we know how important certain features are to users.

Queryable State has some limitations and makes the whole system rather fragile. Most users that try it out are disappointed that there is actually no SQL support. If we could support it, then expensive queries would slow down the actual application... So if we have enough interest in the community, we would rather replace queryable state with some way to replicate state to an external system which supports proper queries and which has no influence on the live application.

FLIP-158 [1] was just accepted and would make it easier to replicate state onto an external system. Replicating an external system is not planned yet, but it's one of the ideas that are floating around. Could you imagine to have your Flink state replicated into some key/value store, log stream, or database for your use case? What would be your preference?

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints

On Wed, Mar 10, 2021 at 2:44 PM Maciek Próchniak <[hidden email]> wrote:

Hi Konstantin,

thanks for detailed answer. I also thought about CoFunction, but it is a bit too heavy for us for the moment (each state would have to have additional kafka producer/consumer).

Guess we'll use QueryableState for now and try to phase it out slowly...

thanks,

maciek

On 09.03.2021 17:42, Konstantin Knauf wrote:

Hi Maciek,

Thank you for reaching out. I'll try to answer your questions separately.

- nothing comparable. You already mention the State Processor API. Besides that, I can only think of a side channel (CoFunction) that is used to request a certain state that is then send to a side output and ultimate to a sink, e.g. Kafka State Request Topic -> Flink -> Kafka State Response Topic. This puts this complexity into the Flink Job, though.

- I think it is a combination of both. Queryable State works well within its limitations. In the case of the RocksDBStatebackend this is mainly the availability of the job and the fact that you might read "uncommitted" state updates. In case of the heap-backed statebackends there are also synchronization issues, e.g. you might read stale values. You also mention the fact that queryable state has been an afterthought when it comes to more recent deployment options. I am not aware of any Committer who currently has the time to work on this to the degree that would be required. So, we thought, it would be more fair and realistic to mark Queryable State as "approaching end of life" in the sense that there is no active development on that component anymore.

Best,

Konstantin

On Tue, Mar 9, 2021 at 7:08 AM Maciek Próchniak <[hidden email]> wrote:

Hello,

We are using QueryableState in some of Nussknacker deployments as a nice
addition, allowing end users to peek inside job state for a given key
(we mostly use custom operators).

Judging by mailing list and feature radar proposition by Stephan:
https://github.com/StephanEwen/flink-web/blob/feature_radar/img/flink_feature_radar.svg

this feature is not widely used/supported. I'd like to ask:

- are there any alternative ways of accessing state during job
execution? State API is very nice, but it operates on checkpoints and
loading whole state to lookup one key seems a bit heavy?

- are there any inherent problems in QueryableState design (e.g. it's
not feasible to use it in K8 settings, performance considerations) or
just lack of interest/support (in that case we may offer some help)?

thanks,

maciek

--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk