(DEPRECATED) Apache Flink User Mailing List archive.

Question on checkpoint management

Classic

List

Threaded

2 messages Options

Cliff Resnick

Question on checkpoint management

When a job cancel-with-savepoint finishes a successful Savepoint, the preceding last successful Checkpoint is removed. Is this the intended behavior? I thought that checkpoints and savepoints were separate entities and, as such, savepoints should not infringe on checkpoints. This is actually an issue for us because we have seen occurrences of false-positive successful savepoints, perhaps due to S3 latency. Bottom line, we'd like to treat savepoints as insurance rather than the critical path and would rather they be oblivious to checkpoint management.

We are using externalized checkpoints, which may be confusing things. Also I know checkpoint management is undergoing some changes in Flink 1.3 (we are on Flink 1.2.0). Any insight is greatly appreciated.

Stefan Richter

Re: Question on checkpoint management

I think this jira is helpful for your question: https://issues.apache.org/jira/browse/FLINK-6328

Am 08.05.2017 um 19:33 schrieb Cliff Resnick <[hidden email]>:

When a job cancel-with-savepoint finishes a successful Savepoint, the preceding last successful Checkpoint is removed. Is this the intended behavior? I thought that checkpoints and savepoints were separate entities and, as such, savepoints should not infringe on checkpoints. This is actually an issue for us because we have seen occurrences of false-positive successful savepoints, perhaps due to S3 latency. Bottom line, we'd like to treat savepoints as insurance rather than the critical path and would rather they be oblivious to checkpoint management.

We are using externalized checkpoints, which may be confusing things. Also I know checkpoint management is undergoing some changes in Flink 1.3 (we are on Flink 1.2.0). Any insight is greatly appreciated.