(DEPRECATED) Apache Flink User Mailing List archive.

restoring from externalized incremental rocksdb checkpoint?

Classic

List

Threaded

4 messages Options

Jeffrey Martin

restoring from externalized incremental rocksdb checkpoint?

Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin

Congxian Qiu

Re: restoring from externalized incremental rocksdb checkpoint?

Hi Jeff

You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` , you may find the metadata in the `chk-xxx` directory[2]

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#directory-structure

Best,

Congxian

Jeffrey Martin <[hidden email]> 于2020年9月15日周二下午1:30写道：

Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin

Jeffrey Martin

Re: restoring from externalized incremental rocksdb checkpoint?

Thanks for the quick reply Congxian.

The non-empty chk-N directories I looked at contained only files whose names are UUIDs. Nothing named _metadata (unless HDFS hides files that start with an underscore?).

Just to be clear though -- I should expect a metadata file when using incremental checkpoints?

On Mon, Sep 14, 2020 at 10:46 PM Congxian Qiu <[hidden email]> wrote:

Hi Jeff
You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` , you may find the metadata in the `chk-xxx` directory[2]

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#directory-structure
Best,
Congxian

Jeffrey Martin <[hidden email]> 于2020年9月15日周二下午1:30写道：
Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin

Congxian Qiu

Re: restoring from externalized incremental rocksdb checkpoint?

Hi Jeff

Sorry for the late reply. You can only restore the checkpoint in which there is a _metadata in the chk-xxx directory, if there is not _metadata in the chk-xxx directory, that means the chk-xxx is not complete, you can't restore from it.

Best,

Congxian

Jeffrey Martin <[hidden email]> 于2020年9月15日周二下午2:18写道：

Thanks for the quick reply Congxian.

The non-empty chk-N directories I looked at contained only files whose names are UUIDs. Nothing named _metadata (unless HDFS hides files that start with an underscore?).

Just to be clear though -- I should expect a metadata file when using incremental checkpoints?

On Mon, Sep 14, 2020 at 10:46 PM Congxian Qiu <[hidden email]> wrote:
Hi Jeff
You can restore from retained checkpoint such as[1] `bin/flink run -s :checkpointMetaDataPath [:runArgs]` , you may find the metadata in the `chk-xxx` directory[2]

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#directory-structure
Best,
Congxian

Jeffrey Martin <[hidden email]> 于2020年9月15日周二下午1:30写道：
Hi,

My job on Flink 1.10 uses RocksDB with incremental checkpointing enabled. The checkpoints are retained on cancellation.

How do I resume from the retained checkpoint after cancellation (e.g., when upgrading the job binary)? Docs say to use the checkpoint or savepoint metadata file, but AFAICT there's no metadata file in HDFS in the various directories under "$checkpointsDir/snapshots/$jobID",

Thanks,

Jeff Martin