(DEPRECATED) Apache Flink User Mailing List archive.

State migration for sql job

Classic

List

Threaded

5 messages Options

aitozi

State migration for sql job

This post was updated on .

When use flink sql, we encounter a big problem to deal with sql state compatibility.
Think we have a group agg sql like

select sum(`a`) from source_t group by `uid`

But if i want to add a new agg column to

select sum(`a`), max(`a`) from source_t group by `uid`

Then sql state will not be compatible. Is there any on-going work/thoughts to improve this situation?

JING ZHANG

Re: State migration for sql job

Hi aitozi,

This is a popular demand that many users mentioned, which appears in user mail list for several times.

Unfortunately, it is not supported by Flink SQL yet, maybe would be solved in the future. BTW, a few company try to solve the problem in some specified user cases on their internal Flink version[2].

Currently, you may try use `State Processor API`[1] as temporary solution.

1. Do a savepoint

2. Generates updated the savepoint based on State Processor API

3. Recover from the new savepoint.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/libs/state_processor_api/

[2] https://developer.aliyun.com/article/781455

Best regards,

JING ZHANG

aitozi <[hidden email]> 于2021年6月8日周二下午1:54写道：

When use flink sql, we encounter a big problem to deal with sql state compatibility. Think we have a group agg sql like ```sql select sum(`a`) from source_t group by `uid` ``` But if i want to add a new agg column to ```sql select sum(`a`), max(`a`) from source_t group by `uid` ``` Then sql state will not be compatible. Is there any on-going work/thoughts to improve this situation?
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Kurt Young

Re: State migration for sql job

What kind of expectation do you have after you add the "max(a)" aggregation:

a. Keep summing a and start to calculate max(a) after you added. In other words, max(a) won't take the history data into account.

b. First process all the historical data to get a result of max(a), and then start to compute sum(a) and max(a) together for the real-time data.

Best,

Kurt

On Tue, Jun 8, 2021 at 2:11 PM JING ZHANG <[hidden email]> wrote:

Hi aitozi,
This is a popular demand that many users mentioned, which appears in user mail list for several times.
Unfortunately, it is not supported by Flink SQL yet, maybe would be solved in the future. BTW, a few company try to solve the problem in some specified user cases on their internal Flink version[2].
Currently, you may try use `State Processor API`[1] as temporary solution.
1. Do a savepoint
2. Generates updated the savepoint based on State Processor API
3. Recover from the new savepoint.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/libs/state_processor_api/
[2] https://developer.aliyun.com/article/781455

Best regards,
JING ZHANG

aitozi <[hidden email]> 于2021年6月8日周二下午1:54写道：
When use flink sql, we encounter a big problem to deal with sql state compatibility. Think we have a group agg sql like ```sql select sum(`a`) from source_t group by `uid` ``` But if i want to add a new agg column to ```sql select sum(`a`), max(`a`) from source_t group by `uid` ``` Then sql state will not be compatible. Is there any on-going work/thoughts to improve this situation?
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

aitozi

Re: State migration for sql job

Thanks for JING & Kurt's reply. I think we prefer to choose the option (a)
that will not take the history data into account.

IMO, if we want to process all the historical data, we have to store the
original data, which may be a big overhead to backend. But if we just
aggregate after the new added function, may just need a data format
transfer. Besides, the business logic we met only need the new aggFunction
accumulate with new data.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Yuval Itzchakov

Re: State migration for sql job

As my company is also a heavy user of Flink SQL, the state migration story is very important to us.

I as well believe that adding new fields should start to accumulate state from the point in time of the change forward.

Is anyone actively working on this? Is there anyway to get involved?

On Tue, Jun 8, 2021, 17:33 aitozi <[hidden email]> wrote:

Thanks for JING & Kurt's reply. I think we prefer to choose the option (a)
that will not take the history data into account.

IMO, if we want to process all the historical data, we have to store the
original data, which may be a big overhead to backend. But if we just
aggregate after the new added function, may just need a data format
transfer. Besides, the business logic we met only need the new aggFunction
accumulate with new data.

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/