restoring from externalized incremental rocksdb checkpoint?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

restoring from externalized incremental rocksdb checkpoint?

Jeffrey Martin
Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin
Reply | Threaded
Open this post in threaded view
|

Re: restoring from externalized incremental rocksdb checkpoint?

Congxian Qiu
Hi Jeff
   You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` ,  you may find the metadata in the `chk-xxx` directory[2]


Jeffrey Martin <[hidden email]> 于2020年9月15日周二 下午1:30写道:
Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin
Reply | Threaded
Open this post in threaded view
|

Re: restoring from externalized incremental rocksdb checkpoint?

Jeffrey Martin
Thanks for the quick reply Congxian.

The non-empty chk-N directories I looked at contained only files whose names are UUIDs. Nothing named _metadata (unless HDFS hides files that start with an underscore?).

Just to be clear though -- I should expect a metadata file when using incremental checkpoints? 

On Mon, Sep 14, 2020 at 10:46 PM Congxian Qiu <[hidden email]> wrote:
Hi Jeff
   You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` ,  you may find the metadata in the `chk-xxx` directory[2]


Jeffrey Martin <[hidden email]> 于2020年9月15日周二 下午1:30写道:
Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin






Reply | Threaded
Open this post in threaded view
|

Re: restoring from externalized incremental rocksdb checkpoint?

Congxian Qiu
Hi  Jeff
   Sorry for the late reply.  You can only restore the checkpoint in which there is a _metadata in the chk-xxx directory, if there is not _metadata in the chk-xxx directory, that means the chk-xxx is not complete, you can't restore from it.

Best,
Congxian


Jeffrey Martin <[hidden email]> 于2020年9月15日周二 下午2:18写道:
Thanks for the quick reply Congxian.

The non-empty chk-N directories I looked at contained only files whose names are UUIDs. Nothing named _metadata (unless HDFS hides files that start with an underscore?).

Just to be clear though -- I should expect a metadata file when using incremental checkpoints? 

On Mon, Sep 14, 2020 at 10:46 PM Congxian Qiu <[hidden email]> wrote:
Hi Jeff
   You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` ,  you may find the metadata in the `chk-xxx` directory[2]


Jeffrey Martin <[hidden email]> 于2020年9月15日周二 下午1:30写道:
Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin