Checkpointing on cluster shutdown

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing on cluster shutdown

James Isaac
Hi,

Suppose I have a working Flink cluster with 1 taskmanager and 1 jobmanager and I have enabled checkpointing with say an interval of 1 minute. 
Now if I shut down the Flink cluster in between checkpoints (say for some upgrade), will the JobManager automatically trigger a checkpoint before going down?

Or is it mandatory to manually trigger savepoints in these cases?
Also am I correct in my understanding that if a taskmanager goes down first, there is no way the TaskManager can trigger the checkpoint on its own?


Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing on cluster shutdown

Chesnay Schepler
No checkpoint will be triggered when the cluster is shutdown. For this
case you will have to manually trigger a savepoint.

If a TM goes down it does not create a checkpoint. IN these cases the
job will be restarted from the last successful checkpoint.

On 05.06.2018 12:01, Data Engineer wrote:

> Hi,
>
> Suppose I have a working Flink cluster with 1 taskmanager and 1
> jobmanager and I have enabled checkpointing with say an interval of 1
> minute.
> Now if I shut down the Flink cluster in between checkpoints (say for
> some upgrade), will the JobManager automatically trigger a checkpoint
> before going down?
>
> Or is it mandatory to manually trigger savepoints in these cases?
> Also am I correct in my understanding that if a taskmanager goes down
> first, there is no way the TaskManager can trigger the checkpoint on
> its own?
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing on cluster shutdown

Garvit Sharma
But job should be terminated gracefully. Why is this behavior not there?

On Tue, Jun 5, 2018 at 4:19 PM, Chesnay Schepler <[hidden email]> wrote:
No checkpoint will be triggered when the cluster is shutdown. For this case you will have to manually trigger a savepoint.

If a TM goes down it does not create a checkpoint. IN these cases the job will be restarted from the last successful checkpoint.


On 05.06.2018 12:01, Data Engineer wrote:
Hi,

Suppose I have a working Flink cluster with 1 taskmanager and 1 jobmanager and I have enabled checkpointing with say an interval of 1 minute.
Now if I shut down the Flink cluster in between checkpoints (say for some upgrade), will the JobManager automatically trigger a checkpoint before going down?

Or is it mandatory to manually trigger savepoints in these cases?
Also am I correct in my understanding that if a taskmanager goes down first, there is no way the TaskManager can trigger the checkpoint on its own?






--

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination that makes him master.
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing on cluster shutdown

Chesnay Schepler
If a TM goes down any data generated after the last successful checkpoint cannot be guaranteed to be consistent across the cluster.
Hence, this data is discarded and we go back to the last known consistent state, the last checkpoint that was successfully created.

On 05.06.2018 13:06, Garvit Sharma wrote:
But job should be terminated gracefully. Why is this behavior not there?

On Tue, Jun 5, 2018 at 4:19 PM, Chesnay Schepler <[hidden email]> wrote:
No checkpoint will be triggered when the cluster is shutdown. For this case you will have to manually trigger a savepoint.

If a TM goes down it does not create a checkpoint. IN these cases the job will be restarted from the last successful checkpoint.


On 05.06.2018 12:01, Data Engineer wrote:
Hi,

Suppose I have a working Flink cluster with 1 taskmanager and 1 jobmanager and I have enabled checkpointing with say an interval of 1 minute.
Now if I shut down the Flink cluster in between checkpoints (say for some upgrade), will the JobManager automatically trigger a checkpoint before going down?

Or is it mandatory to manually trigger savepoints in these cases?
Also am I correct in my understanding that if a taskmanager goes down first, there is no way the TaskManager can trigger the checkpoint on its own?






--

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination that makes him master.