Flink SQL and checkpoints and savepoints

classic Classic list List threaded Threaded
8 messages Options
Dan
Reply | Threaded
Open this post in threaded view
|

Flink SQL and checkpoints and savepoints

Dan
How well does Flink SQL work with checkpoints and savepoints?  I tried to find documentation for it in v1.11 but couldn't find it.

E.g. what happens if the Flink SQL is modified between releases?  New columns?  Change columns?  Adding joins?


Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Timo Walther
Hi Dan,

currently, we cannot provide any savepoint guarantees between releases.
Because of the nature of SQL that abstracts away runtime operators, it
might be that a future execution plan will look completely different and
thus we cannot map state anymore. This is not avoidable because the
optimizer might get smarter when adding new optimizer rules.

For such cases, we recommend to dry out the old pipeline and/or warm up
a new pipeline with historic data when upgrading Flink. A change in
columns sometimes works but even this depends on the used operators.

Regards,
Timo


On 18.01.21 04:46, Dan Hill wrote:
> How well does Flink SQL work with checkpoints and savepoints?  I tried
> to find documentation for it in v1.11 but couldn't find it.
>
> E.g. what happens if the Flink SQL is modified between releases?  New
> columns?  Change columns?  Adding joins?
>
>

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Dan
Thanks Timo!

The reason makes sense.

Do any of the techniques make it easy to support exactly once?

I'm inferring what is meant by dry out.  Are there any documented patterns for it?  E.g. sending data to new kafka topics between releases?




On Mon, Jan 18, 2021, 01:04 Timo Walther <[hidden email]> wrote:
Hi Dan,

currently, we cannot provide any savepoint guarantees between releases.
Because of the nature of SQL that abstracts away runtime operators, it
might be that a future execution plan will look completely different and
thus we cannot map state anymore. This is not avoidable because the
optimizer might get smarter when adding new optimizer rules.

For such cases, we recommend to dry out the old pipeline and/or warm up
a new pipeline with historic data when upgrading Flink. A change in
columns sometimes works but even this depends on the used operators.

Regards,
Timo


On 18.01.21 04:46, Dan Hill wrote:
> How well does Flink SQL work with checkpoints and savepoints?  I tried
> to find documentation for it in v1.11 but couldn't find it.
>
> E.g. what happens if the Flink SQL is modified between releases?  New
> columns?  Change columns?  Adding joins?
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Timo Walther
I would check the past Flink Forward conference talks and blog posts. A
couple of companies have developed connectors or modified existing
connectors to make this work. Usually, based on event timestamps or some
external control stream (DataStream API around the actual SQL pipeline
for handling this).

Also there is FLIP-150 which goes into this direction.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source

Regards,
Timo


On 18.01.21 10:40, Dan Hill wrote:

> Thanks Timo!
>
> The reason makes sense.
>
> Do any of the techniques make it easy to support exactly once?
>
> I'm inferring what is meant by dry out.  Are there any documented
> patterns for it?  E.g. sending data to new kafka topics between releases?
>
>
>
>
> On Mon, Jan 18, 2021, 01:04 Timo Walther <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Dan,
>
>     currently, we cannot provide any savepoint guarantees between releases.
>     Because of the nature of SQL that abstracts away runtime operators, it
>     might be that a future execution plan will look completely different
>     and
>     thus we cannot map state anymore. This is not avoidable because the
>     optimizer might get smarter when adding new optimizer rules.
>
>     For such cases, we recommend to dry out the old pipeline and/or warm up
>     a new pipeline with historic data when upgrading Flink. A change in
>     columns sometimes works but even this depends on the used operators.
>
>     Regards,
>     Timo
>
>
>     On 18.01.21 04:46, Dan Hill wrote:
>      > How well does Flink SQL work with checkpoints and savepoints?  I
>     tried
>      > to find documentation for it in v1.11 but couldn't find it.
>      >
>      > E.g. what happens if the Flink SQL is modified between releases?
>     New
>      > columns?  Change columns?  Adding joins?
>      >
>      >
>

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Dan
Is this savepoint recovery issue also true with the Flink Table API?  I'd assume so.  Just doublechecking.

On Mon, Jan 18, 2021 at 1:58 AM Timo Walther <[hidden email]> wrote:
I would check the past Flink Forward conference talks and blog posts. A
couple of companies have developed connectors or modified existing
connectors to make this work. Usually, based on event timestamps or some
external control stream (DataStream API around the actual SQL pipeline
for handling this).

Also there is FLIP-150 which goes into this direction.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source

Regards,
Timo


On 18.01.21 10:40, Dan Hill wrote:
> Thanks Timo!
>
> The reason makes sense.
>
> Do any of the techniques make it easy to support exactly once?
>
> I'm inferring what is meant by dry out.  Are there any documented
> patterns for it?  E.g. sending data to new kafka topics between releases?
>
>
>
>
> On Mon, Jan 18, 2021, 01:04 Timo Walther <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Dan,
>
>     currently, we cannot provide any savepoint guarantees between releases.
>     Because of the nature of SQL that abstracts away runtime operators, it
>     might be that a future execution plan will look completely different
>     and
>     thus we cannot map state anymore. This is not avoidable because the
>     optimizer might get smarter when adding new optimizer rules.
>
>     For such cases, we recommend to dry out the old pipeline and/or warm up
>     a new pipeline with historic data when upgrading Flink. A change in
>     columns sometimes works but even this depends on the used operators.
>
>     Regards,
>     Timo
>
>
>     On 18.01.21 04:46, Dan Hill wrote:
>      > How well does Flink SQL work with checkpoints and savepoints?  I
>     tried
>      > to find documentation for it in v1.11 but couldn't find it.
>      >
>      > E.g. what happens if the Flink SQL is modified between releases?
>     New
>      > columns?  Change columns?  Adding joins?
>      >
>      >
>

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Dan
I went through a few of the recent Flink Forward videos and didn't see solutions to this problem.  It sounds like some companies have solutions but they didn't talk about them in enough detail to do something similar.

On Thu, Jan 28, 2021 at 11:45 PM Dan Hill <[hidden email]> wrote:
Is this savepoint recovery issue also true with the Flink Table API?  I'd assume so.  Just doublechecking.

On Mon, Jan 18, 2021 at 1:58 AM Timo Walther <[hidden email]> wrote:
I would check the past Flink Forward conference talks and blog posts. A
couple of companies have developed connectors or modified existing
connectors to make this work. Usually, based on event timestamps or some
external control stream (DataStream API around the actual SQL pipeline
for handling this).

Also there is FLIP-150 which goes into this direction.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source

Regards,
Timo


On 18.01.21 10:40, Dan Hill wrote:
> Thanks Timo!
>
> The reason makes sense.
>
> Do any of the techniques make it easy to support exactly once?
>
> I'm inferring what is meant by dry out.  Are there any documented
> patterns for it?  E.g. sending data to new kafka topics between releases?
>
>
>
>
> On Mon, Jan 18, 2021, 01:04 Timo Walther <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Dan,
>
>     currently, we cannot provide any savepoint guarantees between releases.
>     Because of the nature of SQL that abstracts away runtime operators, it
>     might be that a future execution plan will look completely different
>     and
>     thus we cannot map state anymore. This is not avoidable because the
>     optimizer might get smarter when adding new optimizer rules.
>
>     For such cases, we recommend to dry out the old pipeline and/or warm up
>     a new pipeline with historic data when upgrading Flink. A change in
>     columns sometimes works but even this depends on the used operators.
>
>     Regards,
>     Timo
>
>
>     On 18.01.21 04:46, Dan Hill wrote:
>      > How well does Flink SQL work with checkpoints and savepoints?  I
>     tried
>      > to find documentation for it in v1.11 but couldn't find it.
>      >
>      > E.g. what happens if the Flink SQL is modified between releases?
>     New
>      > columns?  Change columns?  Adding joins?
>      >
>      >
>

Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Maximilian Michels
It is true that there are no strict upgrade guarantees.

However, looking at the code, it appears RowSerializer supports adding
new fields to Row - as long as no fields are modified or deleted.
Haven't tried this out but it looks like the code would only restore
existing fields and incorporate the new ones as null values.

Please correct me if I'm wrong.

-Max

On 29.01.21 08:54, Dan Hill wrote:

> I went through a few of the recent Flink Forward videos and didn't see
> solutions to this problem.  It sounds like some companies have solutions
> but they didn't talk about them in enough detail to do something similar.
>
> On Thu, Jan 28, 2021 at 11:45 PM Dan Hill <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Is this savepoint recovery issue also true with the Flink Table
>     API?  I'd assume so.  Just doublechecking.
>
>     On Mon, Jan 18, 2021 at 1:58 AM Timo Walther <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         I would check the past Flink Forward conference talks and blog
>         posts. A
>         couple of companies have developed connectors or modified existing
>         connectors to make this work. Usually, based on event timestamps
>         or some
>         external control stream (DataStream API around the actual SQL
>         pipeline
>         for handling this).
>
>         Also there is FLIP-150 which goes into this direction.
>
>         https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source
>         <https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source>
>
>         Regards,
>         Timo
>
>
>         On 18.01.21 10:40, Dan Hill wrote:
>          > Thanks Timo!
>          >
>          > The reason makes sense.
>          >
>          > Do any of the techniques make it easy to support exactly once?
>          >
>          > I'm inferring what is meant by dry out.  Are there any
>         documented
>          > patterns for it?  E.g. sending data to new kafka topics
>         between releases?
>          >
>          >
>          >
>          >
>          > On Mon, Jan 18, 2021, 01:04 Timo Walther <[hidden email]
>         <mailto:[hidden email]>
>          > <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>          >
>          >     Hi Dan,
>          >
>          >     currently, we cannot provide any savepoint guarantees
>         between releases.
>          >     Because of the nature of SQL that abstracts away runtime
>         operators, it
>          >     might be that a future execution plan will look
>         completely different
>          >     and
>          >     thus we cannot map state anymore. This is not avoidable
>         because the
>          >     optimizer might get smarter when adding new optimizer rules.
>          >
>          >     For such cases, we recommend to dry out the old pipeline
>         and/or warm up
>          >     a new pipeline with historic data when upgrading Flink. A
>         change in
>          >     columns sometimes works but even this depends on the used
>         operators.
>          >
>          >     Regards,
>          >     Timo
>          >
>          >
>          >     On 18.01.21 04:46, Dan Hill wrote:
>          >      > How well does Flink SQL work with checkpoints and
>         savepoints?  I
>          >     tried
>          >      > to find documentation for it in v1.11 but couldn't
>         find it.
>          >      >
>          >      > E.g. what happens if the Flink SQL is modified between
>         releases?
>          >     New
>          >      > columns?  Change columns?  Adding joins?
>          >      >
>          >      >
>          >
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink SQL and checkpoints and savepoints

Timo Walther
I agree with Max.

Within the same Flink release you can perform savepoints and sometimes
also change parts of the query. But the latter depends on a case-by-case
basis and needs to be tested.

Regards,
Timo

On 30.01.21 11:43, Maximilian Michels wrote:

> It is true that there are no strict upgrade guarantees.
>
> However, looking at the code, it appears RowSerializer supports adding
> new fields to Row - as long as no fields are modified or deleted.
> Haven't tried this out but it looks like the code would only restore
> existing fields and incorporate the new ones as null values.
>
> Please correct me if I'm wrong.
>
> -Max
>
> On 29.01.21 08:54, Dan Hill wrote:
>> I went through a few of the recent Flink Forward videos and didn't see
>> solutions to this problem.  It sounds like some companies have
>> solutions but they didn't talk about them in enough detail to do
>> something similar.
>>
>> On Thu, Jan 28, 2021 at 11:45 PM Dan Hill <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Is this savepoint recovery issue also true with the Flink Table
>>     API?  I'd assume so.  Just doublechecking.
>>
>>     On Mon, Jan 18, 2021 at 1:58 AM Timo Walther <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>
>>         I would check the past Flink Forward conference talks and blog
>>         posts. A
>>         couple of companies have developed connectors or modified
>> existing
>>         connectors to make this work. Usually, based on event timestamps
>>         or some
>>         external control stream (DataStream API around the actual SQL
>>         pipeline
>>         for handling this).
>>
>>         Also there is FLIP-150 which goes into this direction.
>>
>>        
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source 
>>
>>        
>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source>
>>
>>
>>         Regards,
>>         Timo
>>
>>
>>         On 18.01.21 10:40, Dan Hill wrote:
>>          > Thanks Timo!
>>          >
>>          > The reason makes sense.
>>          >
>>          > Do any of the techniques make it easy to support exactly once?
>>          >
>>          > I'm inferring what is meant by dry out.  Are there any
>>         documented
>>          > patterns for it?  E.g. sending data to new kafka topics
>>         between releases?
>>          >
>>          >
>>          >
>>          >
>>          > On Mon, Jan 18, 2021, 01:04 Timo Walther <[hidden email]
>>         <mailto:[hidden email]>
>>          > <mailto:[hidden email] <mailto:[hidden email]>>>
>> wrote:
>>          >
>>          >     Hi Dan,
>>          >
>>          >     currently, we cannot provide any savepoint guarantees
>>         between releases.
>>          >     Because of the nature of SQL that abstracts away runtime
>>         operators, it
>>          >     might be that a future execution plan will look
>>         completely different
>>          >     and
>>          >     thus we cannot map state anymore. This is not avoidable
>>         because the
>>          >     optimizer might get smarter when adding new optimizer
>> rules.
>>          >
>>          >     For such cases, we recommend to dry out the old pipeline
>>         and/or warm up
>>          >     a new pipeline with historic data when upgrading Flink. A
>>         change in
>>          >     columns sometimes works but even this depends on the used
>>         operators.
>>          >
>>          >     Regards,
>>          >     Timo
>>          >
>>          >
>>          >     On 18.01.21 04:46, Dan Hill wrote:
>>          >      > How well does Flink SQL work with checkpoints and
>>         savepoints?  I
>>          >     tried
>>          >      > to find documentation for it in v1.11 but couldn't
>>         find it.
>>          >      >
>>          >      > E.g. what happens if the Flink SQL is modified between
>>         releases?
>>          >     New
>>          >      > columns?  Change columns?  Adding joins?
>>          >      >
>>          >      >
>>          >
>>
>