This post was updated on .
When use flink sql, we encounter a big problem to deal with sql state compatibility.
Think we have a group agg sql like select sum(`a`) from source_t group by `uid` But if i want to add a new agg column to select sum(`a`), max(`a`) from source_t group by `uid` Then sql state will not be compatible. Is there any on-going work/thoughts to improve this situation? |
Hi aitozi, This is a popular demand that many users mentioned, which appears in user mail list for several times. Unfortunately, it is not supported by Flink SQL yet, maybe would be solved in the future. BTW, a few company try to solve the problem in some specified user cases on their internal Flink version[2]. Currently, you may try use `State Processor API`[1] as temporary solution. 1. Do a savepoint 2. Generates updated the savepoint based on State Processor API 3. Recover from the new savepoint. Best regards, JING ZHANG aitozi <[hidden email]> 于2021年6月8日周二 下午1:54写道: When use flink sql, we encounter a big problem to deal with sql state compatibility. Think we have a group agg sql like ```sql select sum(`a`) from source_t group by `uid` ``` But if i want to add a new agg column to ```sql select sum(`a`), max(`a`) from source_t group by `uid` ``` Then sql state will not be compatible. Is there any on-going work/thoughts to improve this situation? |
What kind of expectation do you have after you add the "max(a)" aggregation: a. Keep summing a and start to calculate max(a) after you added. In other words, max(a) won't take the history data into account. b. First process all the historical data to get a result of max(a), and then start to compute sum(a) and max(a) together for the real-time data. Best, Kurt On Tue, Jun 8, 2021 at 2:11 PM JING ZHANG <[hidden email]> wrote:
|
Thanks for JING & Kurt's reply. I think we prefer to choose the option (a)
that will not take the history data into account. IMO, if we want to process all the historical data, we have to store the original data, which may be a big overhead to backend. But if we just aggregate after the new added function, may just need a data format transfer. Besides, the business logic we met only need the new aggFunction accumulate with new data. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
As my company is also a heavy user of Flink SQL, the state migration story is very important to us. I as well believe that adding new fields should start to accumulate state from the point in time of the change forward. Is anyone actively working on this? Is there anyway to get involved? On Tue, Jun 8, 2021, 17:33 aitozi <[hidden email]> wrote: Thanks for JING & Kurt's reply. I think we prefer to choose the option (a) |
Free forum by Nabble | Edit this page |