(DEPRECATED) Apache Flink User Mailing List archive.

Queryable State

Classic

List

Threaded

10 messages Options

Vishal Santoshi

Queryable State

We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Vishal Santoshi

Re: Queryable State

To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:

We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Fabian Hueske-2

Re: Queryable State

Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).

- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:

To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Vishal Santoshi

Re: Queryable State

Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?

* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )

* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?

* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:

Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Fabian Hueske-2

Re: Queryable State

Queryable state is "just" an additional feature on regular keyed state. i.e., the only difference is that you can read the state from an outside application.

Besides that it behaves exactly like regular application state

Queryable state is (at the moment) designed to be accessible if a job runs.

If the job fails (and recovers) or is manually taken down for maintenance, the state cannot be queried.

It's not possible to put a load balancer in front of a JobManager. Only one JM is the active master that maintains a running job.

State is also not replicated.

Best, Fabian

2018-03-19 14:24 GMT+01:00 Vishal Santoshi <[hidden email]>:

Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?
* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )
* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?
* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:
Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Vishal Santoshi

Re: Queryable State

Are there plans to address all or few of the above apart from the "JM LB not possible" which seems understandable ?

On Mon, Mar 19, 2018 at 9:58 AM, Fabian Hueske <[hidden email]> wrote:

Queryable state is "just" an additional feature on regular keyed state. i.e., the only difference is that you can read the state from an outside application.
Besides that it behaves exactly like regular application state

Queryable state is (at the moment) designed to be accessible if a job runs.
If the job fails (and recovers) or is manually taken down for maintenance, the state cannot be queried.
It's not possible to put a load balancer in front of a JobManager. Only one JM is the active master that maintains a running job.
State is also not replicated.

Best, Fabian

2018-03-19 14:24 GMT+01:00 Vishal Santoshi <[hidden email]>:
Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?
* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )
* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?
* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:
Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Fabian Hueske-2

Re: Queryable State

AFAIK, there have been discussions to replicate state among TMs to speed up recovery (and improve availability).

However, I'm not aware of plans to implement that.

I don't think serving state while a job is down has been considered yet.

2018-03-19 15:17 GMT+01:00 Vishal Santoshi <[hidden email]>:

Are there plans to address all or few of the above apart from the "JM LB not possible" which seems understandable ?

On Mon, Mar 19, 2018 at 9:58 AM, Fabian Hueske <[hidden email]> wrote:
Queryable state is "just" an additional feature on regular keyed state. i.e., the only difference is that you can read the state from an outside application.
Besides that it behaves exactly like regular application state

Queryable state is (at the moment) designed to be accessible if a job runs.
If the job fails (and recovers) or is manually taken down for maintenance, the state cannot be queried.
It's not possible to put a load balancer in front of a JobManager. Only one JM is the active master that maintains a running job.
State is also not replicated.

Best, Fabian

2018-03-19 14:24 GMT+01:00 Vishal Santoshi <[hidden email]>:
Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?
* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )
* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?
* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:
Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Vishal Santoshi

Re: Queryable State

Thank you. These do look like show stoppers for us. But again thank you.

On Mon, Mar 19, 2018 at 10:31 AM, Fabian Hueske <[hidden email]> wrote:

AFAIK, there have been discussions to replicate state among TMs to speed up recovery (and improve availability).
However, I'm not aware of plans to implement that.

I don't think serving state while a job is down has been considered yet.

2018-03-19 15:17 GMT+01:00 Vishal Santoshi <[hidden email]>:
Are there plans to address all or few of the above apart from the "JM LB not possible" which seems understandable ?

On Mon, Mar 19, 2018 at 9:58 AM, Fabian Hueske <[hidden email]> wrote:
Queryable state is "just" an additional feature on regular keyed state. i.e., the only difference is that you can read the state from an outside application.
Besides that it behaves exactly like regular application state

Queryable state is (at the moment) designed to be accessible if a job runs.
If the job fails (and recovers) or is manually taken down for maintenance, the state cannot be queried.
It's not possible to put a load balancer in front of a JobManager. Only one JM is the active master that maintains a running job.
State is also not replicated.

Best, Fabian

2018-03-19 14:24 GMT+01:00 Vishal Santoshi <[hidden email]>:
Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?
* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )
* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?
* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:
Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Kostas Kloudas

Re: Queryable State

Hi Vishal,

As Fabian said, queryable state is just a feature that exposes the state kept within Flink, and it is not made to

replace functionality that would otherwise be made by a sink. In the future the functionality will definitely evolve

but for there are no discussions currently, for keeping the state of a job even after the job is done.

For being able to do so, with exactly once semantics and all the guarantees provided by Flink, I would recommend

to use an external sink.

Cheers,

Kostas

On Mar 19, 2018, at 6:18 PM, Vishal Santoshi <[hidden email]> wrote:

Thank you. These do look like show stoppers for us. But again thank you.

On Mon, Mar 19, 2018 at 10:31 AM, Fabian Hueske <[hidden email]> wrote:
AFAIK, there have been discussions to replicate state among TMs to speed up recovery (and improve availability).
However, I'm not aware of plans to implement that.

I don't think serving state while a job is down has been considered yet.

2018-03-19 15:17 GMT+01:00 Vishal Santoshi <[hidden email]>:
Are there plans to address all or few of the above apart from the "JM LB not possible" which seems understandable ?

On Mon, Mar 19, 2018 at 9:58 AM, Fabian Hueske <[hidden email]> wrote:
Queryable state is "just" an additional feature on regular keyed state. i.e., the only difference is that you can read the state from an outside application.
Besides that it behaves exactly like regular application state

Queryable state is (at the moment) designed to be accessible if a job runs.
If the job fails (and recovers) or is manually taken down for maintenance, the state cannot be queried.
It's not possible to put a load balancer in front of a JobManager. Only one JM is the active master that maintains a running job.
State is also not replicated.

Best, Fabian

2018-03-19 14:24 GMT+01:00 Vishal Santoshi <[hidden email]>:
Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?
* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )
* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?
* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:
Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.

Vishal Santoshi

Re: Queryable State

Thank you for the clarification.

On Wed, Mar 21, 2018, 4:28 AM Kostas Kloudas <[hidden email]> wrote:

Hi Vishal,

As Fabian said, queryable state is just a feature that exposes the state kept within Flink, and it is not made to
replace functionality that would otherwise be made by a sink. In the future the functionality will definitely evolve
but for there are no discussions currently, for keeping the state of a job even after the job is done.

For being able to do so, with exactly once semantics and all the guarantees provided by Flink, I would recommend
to use an external sink.

Cheers,
Kostas

On Mar 19, 2018, at 6:18 PM, Vishal Santoshi <[hidden email]> wrote:

Thank you. These do look like show stoppers for us. But again thank you.

On Mon, Mar 19, 2018 at 10:31 AM, Fabian Hueske <[hidden email]> wrote:
AFAIK, there have been discussions to replicate state among TMs to speed up recovery (and improve availability).
However, I'm not aware of plans to implement that.

I don't think serving state while a job is down has been considered yet.

2018-03-19 15:17 GMT+01:00 Vishal Santoshi <[hidden email]>:
Are there plans to address all or few of the above apart from the "JM LB not possible" which seems understandable ?

On Mon, Mar 19, 2018 at 9:58 AM, Fabian Hueske <[hidden email]> wrote:
Queryable state is "just" an additional feature on regular keyed state. i.e., the only difference is that you can read the state from an outside application.
Besides that it behaves exactly like regular application state

Queryable state is (at the moment) designed to be accessible if a job runs.
If the job fails (and recovers) or is manually taken down for maintenance, the state cannot be queried.
It's not possible to put a load balancer in front of a JobManager. Only one JM is the active master that maintains a running job.
State is also not replicated.

Best, Fabian

2018-03-19 14:24 GMT+01:00 Vishal Santoshi <[hidden email]>:
Those are understandable. I am more interested in a few things ( and may be more that could be added )

* As far as I can understand JM is the SPOF. Does HA become a necessity ?
* If there are 2 or more JM could we theoretically have a LB fronting them ? Thus it is a peer to peer access ( Cassandra ) or a master slave setup for JM HA specifically for Queryable access ( For flink jobs it is master slave )
* Do we replicate state to other TMs for read optimization ( specifically to avoid Hot Node issues ) ?
* If the job goes down it seems the state is not accessible. What plans to we have to "separate concerns" for Queryable state.

We consider Queryable State significant a feature Flink provides and would do the necessary leg work if there are certain gaps in it being trully considered a Highly Available Key Value store.

Regards.

On Mon, Mar 19, 2018 at 5:53 AM, Fabian Hueske <[hidden email]> wrote:
Hi Vishal,

In general, Queryable State should be ready to use.
There are a few things to consider though:

- State queries are not synchronized with the application code, i.e., they can happen at the same time. Therefore, the Flink application should not modify objects that have been put into or read from the state if you are not using the RocksDBStatebackend (which creates copies by deserialization).
- State will be rolled back after a failure. Hence, you can read writes that are not "committed by a checkpoint".

@Kostas, did I forget something?

Best, Fabian

2018-03-18 16:50 GMT+01:00 Vishal Santoshi <[hidden email]>:
To be more precise, is anything thing similar to https://engineering.linkedin.com/blog/2018/03/air-traffic-controller--member-first-notifications-at-linkedin . done in Samza, can be replicated with production level guarantees with Flink Queryable state ( as it stands currently version 1.5 ) ?

On Fri, Mar 16, 2018 at 5:10 PM, Vishal Santoshi <[hidden email]> wrote:
We are making few decisions on use cases where Queryable state is a natural fit https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html

Is Queryable state production ready ? We will go to 1.5 flnk if that helps to make the case for the usage.