Checkpoint/ Savepoint usage

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpoint/ Savepoint usage

Rinat
Hi mates, on my way of using BucketingSink, I've decided to enable checkpointing, to prevent hanging of files in open state on job failure.
But it seems, that I’m not properly understood the meaning of checkpointing …

I’ve enabled the fs backend for checkpoints, and while job is working everything works fine, file with the state is created, and if I kill the taskmanager, it will be restored.
But in case, when I kill the whole job, and run it again, the state from last checkpoint won’t be used, and one more new state is created.

If I properly understood, checkpointing state is used by job manager, while job is running, and if I would like to cancel/ kill the job, I should use savepoints.

So I got the following questions:

  • are my assumptions about checkpoint/ savepoint state usage correct ?
  • when I’m creating a savepoint, only hdfs could be used as a backend ?
  • when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ?
  • do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever

Reply | Threaded
Open this post in threaded view
|

Re:Checkpoint/ Savepoint usage

gerryzhou
Hi Rinat,

> are my assumptions about checkpoint/ savepoint state usage correct ?

Indeed, a bit incorrect, you can also restore the job from a checkpoint. By default, the checkpoint data will be removed if the job finish(maybe canceled by user), but you can configure flink to retain the checkpoint when the job is finish. That way, if you cancel the job, the checkpoint will be retained, and you could restore your job from it later. You could refer to https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html#retained-checkpoints to find more information.

when I’m creating a savepoint, only hdfs could be used as a backend?

No, once flink's all components(TMs and JM) are accessiable to the target storage, you could creating a savepoint to there. For example, s3 or even "local file system"(if your TMs and JM are on a single machine). 

> when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ?

I think there maybe a little misunderstand here, RocksDB backend is used for storing keyed state, this is the same as the Heap backend, it doesn't determine the checkpoint backend at all.

> do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ?

No, currently that is unsupported, I think you hited an issue that are currently under disscussion, it's a bit tricky to do that for some reason. You could refer to https://issues.apache.org/jira/browse/FLINK-9043 to get more information.

Best, Sihua

On 06/14/2018 03:54[hidden email] wrote:
Hi mates, on my way of using BucketingSink, I've decided to enable checkpointing, to prevent hanging of files in open state on job failure.
But it seems, that I’m not properly understood the meaning of checkpointing …

I’ve enabled the fs backend for checkpoints, and while job is working everything works fine, file with the state is created, and if I kill the taskmanager, it will be restored.
But in case, when I kill the whole job, and run it again, the state from last checkpoint won’t be used, and one more new state is created.

If I properly understood, checkpointing state is used by job manager, while job is running, and if I would like to cancel/ kill the job, I should use savepoints.

So I got the following questions:

  • are my assumptions about checkpoint/ savepoint state usage correct ?
  • when I’m creating a savepoint, only hdfs could be used as a backend ?
  • when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ?
  • do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data clever