(DEPRECATED) Apache Flink User Mailing List archive.

Checkpoint/ Savepoint usage

Classic

List

Threaded

2 messages Options

Rinat

Checkpoint/ Savepoint usage

Hi mates, on my way of using BucketingSink, I've decided to enable checkpointing, to prevent hanging of files in open state on job failure.

But it seems, that I’m not properly understood the meaning of checkpointing …

I’ve enabled the fs backend for checkpoints, and while job is working everything works fine, file with the state is created, and if I kill the taskmanager, it will be restored.

But in case, when I kill the whole job, and run it again, the state from last checkpoint won’t be used, and one more new state is created.

If I properly understood, checkpointing state is used by job manager, while job is running, and if I would like to cancel/ kill the job, I should use savepoints.

So I got the following questions:

are my assumptions about checkpoint/ savepoint state usage correct ?
when I’m creating a savepoint, only hdfs could be used as a backend ?
when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ?
do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ?

Sincerely yours,

Rinat Sharipov

Software Engineer at 1DMP CORE Team

email: [hidden email]

mobile: +7 (925) 416-37-26

CleverDATA

make your data clever

gerryzhou

Re:Checkpoint/ Savepoint usage

Hi Rinat,

> are my assumptions about checkpoint/ savepoint state usage correct ?

Indeed, a bit incorrect, you can also restore the job from a checkpoint. By default, the checkpoint data will be removed if the job finish(maybe canceled by user), but you can configure flink to retain the checkpoint when the job is finish. That way, if you cancel the job, the checkpoint will be retained, and you could restore your job from it later. You could refer to https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html#retained-checkpoints to find more information.

> when I’m creating a savepoint, only hdfs could be used as a backend?

No, once flink's all components(TMs and JM) are accessiable to the target storage, you could creating a savepoint to there. For example, s3 or even "local file system"(if your TMs and JM are on a single machine).

> when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ?

I think there maybe a little misunderstand here, RocksDB backend is used for storing keyed state, this is the same as the Heap backend, it doesn't determine the checkpoint backend at all.

> do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ?

No, currently that is unsupported, I think you hited an issue that are currently under disscussion, it's a bit tricky to do that for some reason. You could refer to https://issues.apache.org/jira/browse/FLINK-9043 to get more information.

Best, Sihua

On 06/14/2018 03:54，[hidden email] wrote：

Hi mates, on my way of using BucketingSink, I've decided to enable checkpointing, to prevent hanging of files in open state on job failure.
But it seems, that I’m not properly understood the meaning of checkpointing …

I’ve enabled the fs backend for checkpoints, and while job is working everything works fine, file with the state is created, and if I kill the taskmanager, it will be restored.
But in case, when I kill the whole job, and run it again, the state from last checkpoint won’t be used, and one more new state is created.

If I properly understood, checkpointing state is used by job manager, while job is running, and if I would like to cancel/ kill the job, I should use savepoints.

So I got the following questions:

are my assumptions about checkpoint/ savepoint state usage correct ?
when I’m creating a savepoint, only hdfs could be used as a backend ?
when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ?
do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: [hidden email]
mobile: +7 (925) 416-37-26

CleverDATA
make your data clever