(DEPRECATED) Apache Flink User Mailing List archive.

yarn and checkpointing

Classic

List

Threaded

2 messages Options

Gwenhael Pasquiers

yarn and checkpointing

Hi,

Is it possible to use checkpointing to restore the state of an app after a restart on yarn ?

From what I've seen it looks like that checkpointing only works within a flink cluster life-time. However the yarn mode has one cluster per app, and (unless the app crashes and is automatically restarted by the restart-strategy) the over-yarn-cluster has the same life time as the app, so when we stop the app, we stop the cluster that will clean it's checkpoints.

So when the app is stopped, the cluster dies and cleans the checkpoints folder. Then of course it won't be able to restore the state at the next run.

When running flink on yarn are we supposed to cancel with savepoint and then restore from savepoint ?

Chesnay Schepler

Re: yarn and checkpointing

Checkpoints are only used for recovery during the job execution.

If the entire cluster is shutdown and restarted you will need to take a
savepoint and restore from that.

On 29.08.2017 16:46, Gwenhael Pasquiers wrote:
> Hi,
>
> Is it possible to use checkpointing to restore the state of an app after a restart on yarn ?
>
> From what I've seen it looks like that checkpointing only works within a flink cluster life-time. However the yarn mode has one cluster per app, and (unless the app crashes and is automatically restarted by the restart-strategy) the over-yarn-cluster has the same life time as the app, so when we stop the app, we stop the cluster that will clean it's checkpoints.
>
> So when the app is stopped, the cluster dies and cleans the checkpoints folder. Then of course it won't be able to restore the state at the next run.
>
> When running flink on yarn are we supposed to cancel with savepoint and then restore from savepoint ?