Does job restart resume from last known internal checkpoint?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Does job restart resume from last known internal checkpoint?

Moiz Jinia
In a checkpointed Flink job will doing a graceful restart make it resume from last known internal checkpoint? Or are all checkpoints discarded when the job is stopped?

If discarded, what will be the resume point?

Moiz
Reply | Threaded
Open this post in threaded view
|

Re: Does job restart resume from last known internal checkpoint?

Timo Walther
Hi Moiz,

yes the job will be restartet in case of failure using the last
successful checkpoint. If you cancel the job, the checkpoints will be
discarded. That's why Flink has savepoints [1] in order to store
checkpoints permantently (with additional meta-information). If there is
no checkpoint/savepoint, the job would start with empty state in all
operators which also means that Kafka offets are reset.

If you are interested in checkpointing internals, you can find more
information here [2].

Regards,
Timo

[1] https://data-artisans.com/blog/turning-back-time-savepoints
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html


Am 30.05.17 um 18:47 schrieb Moiz S Jinia:
> In a checkpointed Flink job will doing a graceful restart make it
> resume from last known internal checkpoint? Or are all checkpoints
> discarded when the job is stopped?
>
> If discarded, what will be the resume point?
>
> Moiz


Reply | Threaded
Open this post in threaded view
|

Re: Does job restart resume from last known internal checkpoint?

Nico Kruber
Additionally, externalized checkpoints [3] may be retained after cancelling a
job. However, externalized checkpoints do not support rescaling (some
documentation improvements on this part are already present in a PR[4]).


Nico

[3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/
checkpoints.html
[4] https://github.com/apache/flink/pull/4033

On Wednesday, 31 May 2017 16:17:36 CEST Timo Walther wrote:

> Hi Moiz,
>
> yes the job will be restartet in case of failure using the last
> successful checkpoint. If you cancel the job, the checkpoints will be
> discarded. That's why Flink has savepoints [1] in order to store
> checkpoints permantently (with additional meta-information). If there is
> no checkpoint/savepoint, the job would start with empty state in all
> operators which also means that Kafka offets are reset.
>
> If you are interested in checkpointing internals, you can find more
> information here [2].
>
> Regards,
> Timo
>
> [1] https://data-artisans.com/blog/turning-back-time-savepoints
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream
> _checkpointing.html
> Am 30.05.17 um 18:47 schrieb Moiz S Jinia:
> > In a checkpointed Flink job will doing a graceful restart make it
> > resume from last known internal checkpoint? Or are all checkpoints
> > discarded when the job is stopped?
> >
> > If discarded, what will be the resume point?
> >
> > Moiz


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Does job restart resume from last known internal checkpoint?

Moiz Jinia
In reply to this post by Moiz Jinia
Bump..

On Tue, May 30, 2017 at 10:17 PM, Moiz S Jinia <[hidden email]> wrote:
In a checkpointed Flink job will doing a graceful restart make it resume from last known internal checkpoint? Or are all checkpoints discarded when the job is stopped?

If discarded, what will be the resume point?

Moiz

Reply | Threaded
Open this post in threaded view
|

Re: Does job restart resume from last known internal checkpoint?

Nico Kruber
Hi Moiz,
didn't Timo's answer cover your questions?

see here in case you didn't receive it:
https://lists.apache.org/thread.html/
a1a0d04e7707f4b0ac8b8b2f368110b898b2ba11463d32f9bba73968@
%3Cuser.flink.apache.org%3E


Nico

On Thursday, 1 June 2017 20:30:59 CEST Moiz S Jinia wrote:

> Bump..
>
> On Tue, May 30, 2017 at 10:17 PM, Moiz S Jinia <[hidden email]> wrote:
> > In a checkpointed Flink job will doing a graceful restart make it resume
> > from last known internal checkpoint? Or are all checkpoints discarded when
> > the job is stopped?
> >
> > If discarded, what will be the resume point?
> >
> > Moiz


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Does job restart resume from last known internal checkpoint?

Moiz Jinia
Thanks for that! Yes I indeed did not receive those emails. And my question is answered.

Moiz

On Fri, Jun 2, 2017 at 12:46 PM, Nico Kruber <[hidden email]> wrote:
Hi Moiz,
didn't Timo's answer cover your questions?

see here in case you didn't receive it:
https://lists.apache.org/thread.html/
a1a0d04e7707f4b0ac8b8b2f368110b898b2ba11463d32f9bba73968@
%3Cuser.flink.apache.org%3E


Nico

On Thursday, 1 June 2017 20:30:59 CEST Moiz S Jinia wrote:
> Bump..
>
> On Tue, May 30, 2017 at 10:17 PM, Moiz S Jinia <[hidden email]> wrote:
> > In a checkpointed Flink job will doing a graceful restart make it resume
> > from last known internal checkpoint? Or are all checkpoints discarded when
> > the job is stopped?
> >
> > If discarded, what will be the resume point?
> >
> > Moiz