yarn and checkpointing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

yarn and checkpointing

Gwenhael Pasquiers
Hi,

Is it possible to use checkpointing to restore the state of an app after a restart on yarn ?

From what I've seen it looks like that checkpointing only works within a flink cluster life-time. However the yarn mode has one cluster per app, and (unless the app crashes and is automatically restarted by the restart-strategy) the over-yarn-cluster has the same life time as the app, so when we stop the app, we stop the cluster that will clean it's checkpoints.

So when the app is stopped, the cluster dies and cleans the checkpoints folder. Then of course it won't be able to restore the state at the next run.

When running flink on yarn are we supposed to cancel with savepoint and then restore from savepoint ?
Reply | Threaded
Open this post in threaded view
|

Re: yarn and checkpointing

Chesnay Schepler
Checkpoints are only used for recovery during the job execution.

If the entire cluster is shutdown and restarted you will need to take a
savepoint and restore from that.

On 29.08.2017 16:46, Gwenhael Pasquiers wrote:
> Hi,
>
> Is it possible to use checkpointing to restore the state of an app after a restart on yarn ?
>
>  From what I've seen it looks like that checkpointing only works within a flink cluster life-time. However the yarn mode has one cluster per app, and (unless the app crashes and is automatically restarted by the restart-strategy) the over-yarn-cluster has the same life time as the app, so when we stop the app, we stop the cluster that will clean it's checkpoints.
>
> So when the app is stopped, the cluster dies and cleans the checkpoints folder. Then of course it won't be able to restore the state at the next run.
>
> When running flink on yarn are we supposed to cancel with savepoint and then restore from savepoint ?