Is chk-$id/_metadata created regardless of enabling externalized checkpoints?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Is chk-$id/_metadata created regardless of enabling externalized checkpoints?

Dongwon Kim-2
Hi,

First of all, happy new year!
It can be a very basic question but I have something to clarify in my head.

my flink-conf.yaml is as follows (note that I didn't specify the value of "execution-checkpointing-externalized-checkpoint-retention [1]"):

#...

execution.checkpointing.interval: 20min

execution.checkpointing.min-pause: 1min


state.backend: rocksdb

state.backend.incremental: true


state.checkpoints.dir: hdfs:///flink-jobs/ckpts

state.checkpoints.num-retained: 10


state.savepoints.dir: hdfs:///flink-jobs/svpts

#...


And the checkpoint configuration is shown as follows in Web UI (note that "Persist Checkpoints Externally" is "Disabled" in the final row):
image.png

According to [2],
  • externalized checkpoints: You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails. There are more details in the deployment notes on externalized checkpoints.

So I've thought the metadata of a checkpoint is only on JobManager's memory and not stored on HDFS unless "execution-checkpointing-externalized-checkpoint-retention" is set.

However, even without setting the value, every checkpoint already contains its own metadata:

[user@devflink conf]$ hdfs dfs -ls /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/*

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-945/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-946/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163157 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-947/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     156684 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-948/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-949/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-950/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     162937 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-951/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     175089 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-952/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     173289 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-953/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     153951 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954/_metadata

Found 21 items

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/05d76f4e-3d9c-420c-8b87-077fc9880d9a

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0b9d9323-9f10-4fc2-8fcc-a9326448b07c

-rw-r--r--   3 user hdfs      81082 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0f6779d0-3a2e-4a94-be9b-d9d6710a7ea0

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/107b3b74-634a-462c-bf40-1d4886117aa9

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/18a538c6-d40e-48c0-a965-d65be407a124

-rw-r--r--   3 user hdfs      83550 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/24ed9c4a-0b8e-45d4-95b8-64547cb9c541

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/35ee9665-7c1f-4407-beb5-fbb312d84907

-rw-r--r--   3 user hdfs      47997 2021-01-04 11:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/36363172-c401-4d60-a970-cfb2b3cbf058

-rw-r--r--   3 user hdfs      81082 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/43aecc8c-145f-43ba-81a8-b0ce2c3498f4

-rw-r--r--   3 user hdfs      79898 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/5743f278-fc50-4c4a-b14e-89bfdb2139fa

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/67e16688-c48c-409b-acac-e7091a84d548

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/773ef43d-936a-4f33-9b0a-d3ff090637c7

-rw-r--r--   3 user hdfs      82046 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/81ac58ef-8810-4fa6-ad8f-a5ec0c0cc885

-rw-r--r--   3 user hdfs      86089 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/8e202c6a-f702-487b-bd00-43739a8c79a2

-rw-r--r--   3 user hdfs      84875 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/a6d4db40-2efe-495c-8e94-a9c31876e4d3

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/b54c5d30-b152-4fba-b0ac-dba598c93646

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/c36433cf-9e79-46ee-a93f-fe042e3c583f

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/e8a27366-4764-4ef0-ae6b-85ed936f6935

-rw-r--r--   3 user hdfs      80747 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/eb6476de-1e35-4d0c-bc6b-2f3214abfffd

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/efd13c04-cbac-4c68-a132-1f9dc9afc7b4

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/f63ba16a-6664-49b6-878f-efba342270be


And resuming from a checkpoint directory (e.g. /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954) is perfectly working as wished.

So I'm wondering
- is every checkpoint already meant to have its metadata on HDFS even without setting the value of "execution-checkpointing-externalized-checkpoint-retention"?
- is setting "execution-checkpointing-externalized-checkpoint-retention" only needed when I want to retain checkpoints in case a job fails or is intentionally cancelled?


Best,

Dongwon
Reply | Threaded
Open this post in threaded view
|

Re: Is chk-$id/_metadata created regardless of enabling externalized checkpoints?

Yun Gao
Hi Dongwon,

   Happy new year! One meta file would be stored on top of HDFS even if external-checkpoint is not enabled. If external checkpoint is not enabled, flink would delete all the checkpoints on exit, and if external checkpoint is enabled, the checkpoints would be kept on cancel or fail cases, according to the settings. Thus for the second issue, I think it would be yes.

Best,
 Yun

------------------Original Mail ------------------
Sender:Dongwon Kim <[hidden email]>
Send Date:Mon Jan 4 19:16:39 2021
Recipients:user <[hidden email]>
Subject:Is chk-$id/_metadata created regardless of enabling externalized checkpoints?
Hi,

First of all, happy new year!
It can be a very basic question but I have something to clarify in my head.

my flink-conf.yaml is as follows (note that I didn't specify the value of "execution-checkpointing-externalized-checkpoint-retention [1]"):

#...

execution.checkpointing.interval: 20min

execution.checkpointing.min-pause: 1min


state.backend: rocksdb

state.backend.incremental: true


state.checkpoints.dir: hdfs:///flink-jobs/ckpts

state.checkpoints.num-retained: 10


state.savepoints.dir: hdfs:///flink-jobs/svpts

#...


And the checkpoint configuration is shown as follows in Web UI (note that "Persist Checkpoints Externally" is "Disabled" in the final row):


According to [2],
  • externalized checkpoints: You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails. There are more details in the deployment notes on externalized checkpoints.

So I've thought the metadata of a checkpoint is only on JobManager's memory and not stored on HDFS unless "execution-checkpointing-externalized-checkpoint-retention" is set.

However, even without setting the value, every checkpoint already contains its own metadata:

[user@devflink conf]$ hdfs dfs -ls /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/*

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-945/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-946/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163157 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-947/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     156684 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-948/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-949/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-950/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     162937 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-951/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     175089 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-952/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     173289 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-953/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     153951 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954/_metadata

Found 21 items

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/05d76f4e-3d9c-420c-8b87-077fc9880d9a

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0b9d9323-9f10-4fc2-8fcc-a9326448b07c

-rw-r--r--   3 user hdfs      81082 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0f6779d0-3a2e-4a94-be9b-d9d6710a7ea0

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/107b3b74-634a-462c-bf40-1d4886117aa9

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/18a538c6-d40e-48c0-a965-d65be407a124

-rw-r--r--   3 user hdfs      83550 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/24ed9c4a-0b8e-45d4-95b8-64547cb9c541

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/35ee9665-7c1f-4407-beb5-fbb312d84907

-rw-r--r--   3 user hdfs      47997 2021-01-04 11:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/36363172-c401-4d60-a970-cfb2b3cbf058

-rw-r--r--   3 user hdfs      81082 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/43aecc8c-145f-43ba-81a8-b0ce2c3498f4

-rw-r--r--   3 user hdfs      79898 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/5743f278-fc50-4c4a-b14e-89bfdb2139fa

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/67e16688-c48c-409b-acac-e7091a84d548

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/773ef43d-936a-4f33-9b0a-d3ff090637c7

-rw-r--r--   3 user hdfs      82046 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/81ac58ef-8810-4fa6-ad8f-a5ec0c0cc885

-rw-r--r--   3 user hdfs      86089 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/8e202c6a-f702-487b-bd00-43739a8c79a2

-rw-r--r--   3 user hdfs      84875 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/a6d4db40-2efe-495c-8e94-a9c31876e4d3

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/b54c5d30-b152-4fba-b0ac-dba598c93646

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/c36433cf-9e79-46ee-a93f-fe042e3c583f

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/e8a27366-4764-4ef0-ae6b-85ed936f6935

-rw-r--r--   3 user hdfs      80747 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/eb6476de-1e35-4d0c-bc6b-2f3214abfffd

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/efd13c04-cbac-4c68-a132-1f9dc9afc7b4

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/f63ba16a-6664-49b6-878f-efba342270be


And resuming from a checkpoint directory (e.g. /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954) is perfectly working as wished.

So I'm wondering
- is every checkpoint already meant to have its metadata on HDFS even without setting the value of "execution-checkpointing-externalized-checkpoint-retention"?
- is setting "execution-checkpointing-externalized-checkpoint-retention" only needed when I want to retain checkpoints in case a job fails or is intentionally cancelled?


Best,

Dongwon
Reply | Threaded
Open this post in threaded view
|

Re: Is chk-$id/_metadata created regardless of enabling externalized checkpoints?

Dongwon Kim-2
Thanks Yun for explanation :) it really helps a lot. 

A related question is how I can enable externalized checkpoint in flink-conf.yaml?

It seems like setting "execution-checkpointing-externalized-checkpoint-retention" to RETAIN_ON_CANCELLATION or DELETE_ON_CANCELLATION on flink-conf.yaml is not enough.
The final row shows that it is not enabled even after setting it to either one (FYI, I'm using Flink-1.12.0).
image.png
Even setting it to RETAIN_ON_CANCELLATION, I found that a cancelled job cleans up all its checkpoints on HDFS, which is against the definition of RETAIN_ON_CANCELLATION.

So I have to add the following lines in my Flink application:
CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
Now the external checkpoint seems to be enabled:
image.png

Does it have nothing to do with FLIP-59? Am I missing something or is it a bug?

Thanks, 

Dongwon

On Tue, Jan 5, 2021 at 12:04 AM Yun Gao <[hidden email]> wrote:
Hi Dongwon,

   Happy new year! One meta file would be stored on top of HDFS even if external-checkpoint is not enabled. If external checkpoint is not enabled, flink would delete all the checkpoints on exit, and if external checkpoint is enabled, the checkpoints would be kept on cancel or fail cases, according to the settings. Thus for the second issue, I think it would be yes.

Best,
 Yun

------------------Original Mail ------------------
Sender:Dongwon Kim <[hidden email]>
Send Date:Mon Jan 4 19:16:39 2021
Recipients:user <[hidden email]>
Subject:Is chk-$id/_metadata created regardless of enabling externalized checkpoints?
Hi,

First of all, happy new year!
It can be a very basic question but I have something to clarify in my head.

my flink-conf.yaml is as follows (note that I didn't specify the value of "execution-checkpointing-externalized-checkpoint-retention [1]"):

#...

execution.checkpointing.interval: 20min

execution.checkpointing.min-pause: 1min


state.backend: rocksdb

state.backend.incremental: true


state.checkpoints.dir: hdfs:///flink-jobs/ckpts

state.checkpoints.num-retained: 10


state.savepoints.dir: hdfs:///flink-jobs/svpts

#...


And the checkpoint configuration is shown as follows in Web UI (note that "Persist Checkpoints Externally" is "Disabled" in the final row):
image.png

According to [2],
  • externalized checkpoints: You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails. There are more details in the deployment notes on externalized checkpoints.

So I've thought the metadata of a checkpoint is only on JobManager's memory and not stored on HDFS unless "execution-checkpointing-externalized-checkpoint-retention" is set.

However, even without setting the value, every checkpoint already contains its own metadata:

[user@devflink conf]$ hdfs dfs -ls /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/*

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-945/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-946/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163157 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-947/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     156684 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-948/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-949/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-950/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     162937 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-951/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     175089 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-952/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     173289 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-953/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     153951 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954/_metadata

Found 21 items

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/05d76f4e-3d9c-420c-8b87-077fc9880d9a

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0b9d9323-9f10-4fc2-8fcc-a9326448b07c

-rw-r--r--   3 user hdfs      81082 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0f6779d0-3a2e-4a94-be9b-d9d6710a7ea0

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/107b3b74-634a-462c-bf40-1d4886117aa9

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/18a538c6-d40e-48c0-a965-d65be407a124

-rw-r--r--   3 user hdfs      83550 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/24ed9c4a-0b8e-45d4-95b8-64547cb9c541

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/35ee9665-7c1f-4407-beb5-fbb312d84907

-rw-r--r--   3 user hdfs      47997 2021-01-04 11:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/36363172-c401-4d60-a970-cfb2b3cbf058

-rw-r--r--   3 user hdfs      81082 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/43aecc8c-145f-43ba-81a8-b0ce2c3498f4

-rw-r--r--   3 user hdfs      79898 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/5743f278-fc50-4c4a-b14e-89bfdb2139fa

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/67e16688-c48c-409b-acac-e7091a84d548

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/773ef43d-936a-4f33-9b0a-d3ff090637c7

-rw-r--r--   3 user hdfs      82046 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/81ac58ef-8810-4fa6-ad8f-a5ec0c0cc885

-rw-r--r--   3 user hdfs      86089 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/8e202c6a-f702-487b-bd00-43739a8c79a2

-rw-r--r--   3 user hdfs      84875 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/a6d4db40-2efe-495c-8e94-a9c31876e4d3

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/b54c5d30-b152-4fba-b0ac-dba598c93646

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/c36433cf-9e79-46ee-a93f-fe042e3c583f

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/e8a27366-4764-4ef0-ae6b-85ed936f6935

-rw-r--r--   3 user hdfs      80747 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/eb6476de-1e35-4d0c-bc6b-2f3214abfffd

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/efd13c04-cbac-4c68-a132-1f9dc9afc7b4

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/f63ba16a-6664-49b6-878f-efba342270be


And resuming from a checkpoint directory (e.g. /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954) is perfectly working as wished.

So I'm wondering
- is every checkpoint already meant to have its metadata on HDFS even without setting the value of "execution-checkpointing-externalized-checkpoint-retention"?
- is setting "execution-checkpointing-externalized-checkpoint-retention" only needed when I want to retain checkpoints in case a job fails or is intentionally cancelled?


Best,

Dongwon
Reply | Threaded
Open this post in threaded view
|

Re: Is chk-$id/_metadata created regardless of enabling externalized checkpoints?

Yun Tang
Hi Dongwon,

What's the actual setting of this option? Setting the 'execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION' should work. 
This is verified in tests and I also confirm this in my submitted jobs.

Best
Yun Tang



From: Dongwon Kim <[hidden email]>
Sent: Tuesday, January 5, 2021 1:46
To: Yun Gao <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Is chk-$id/_metadata created regardless of enabling externalized checkpoints?
 
Thanks Yun for explanation :) it really helps a lot. 

A related question is how I can enable externalized checkpoint in flink-conf.yaml?

It seems like setting "execution-checkpointing-externalized-checkpoint-retention" to RETAIN_ON_CANCELLATION or DELETE_ON_CANCELLATION on flink-conf.yaml is not enough.
The final row shows that it is not enabled even after setting it to either one (FYI, I'm using Flink-1.12.0).
image.png
Even setting it to RETAIN_ON_CANCELLATION, I found that a cancelled job cleans up all its checkpoints on HDFS, which is against the definition of RETAIN_ON_CANCELLATION.

So I have to add the following lines in my Flink application:
CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
Now the external checkpoint seems to be enabled:
image.png

Does it have nothing to do with FLIP-59? Am I missing something or is it a bug?

Thanks, 

Dongwon

On Tue, Jan 5, 2021 at 12:04 AM Yun Gao <[hidden email]> wrote:
Hi Dongwon,

   Happy new year! One meta file would be stored on top of HDFS even if external-checkpoint is not enabled. If external checkpoint is not enabled, flink would delete all the checkpoints on exit, and if external checkpoint is enabled, the checkpoints would be kept on cancel or fail cases, according to the settings. Thus for the second issue, I think it would be yes.

Best,
 Yun

------------------Original Mail ------------------
Sender:Dongwon Kim <[hidden email]>
Send Date:Mon Jan 4 19:16:39 2021
Recipients:user <[hidden email]>
Subject:Is chk-$id/_metadata created regardless of enabling externalized checkpoints?
Hi,

First of all, happy new year!
It can be a very basic question but I have something to clarify in my head.

my flink-conf.yaml is as follows (note that I didn't specify the value of "execution-checkpointing-externalized-checkpoint-retention [1]"):

#...

execution.checkpointing.interval: 20min

execution.checkpointing.min-pause: 1min


state.backend: rocksdb

state.backend.incremental: true


state.checkpoints.dir: hdfs:///flink-jobs/ckpts

state.checkpoints.num-retained: 10


state.savepoints.dir: hdfs:///flink-jobs/svpts

#...


And the checkpoint configuration is shown as follows in Web UI (note that "Persist Checkpoints Externally" is "Disabled" in the final row):
image.png

According to [2],
  • externalized checkpoints: You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails. There are more details in the deployment notes on externalized checkpoints.

So I've thought the metadata of a checkpoint is only on JobManager's memory and not stored on HDFS unless "execution-checkpointing-externalized-checkpoint-retention" is set.

However, even without setting the value, every checkpoint already contains its own metadata:

[user@devflink conf]$ hdfs dfs -ls /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/*

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-945/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-946/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163157 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-947/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     156684 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-948/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-949/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-950/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     162937 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-951/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     175089 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-952/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     173289 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-953/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     153951 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954/_metadata

Found 21 items

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/05d76f4e-3d9c-420c-8b87-077fc9880d9a

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0b9d9323-9f10-4fc2-8fcc-a9326448b07c

-rw-r--r--   3 user hdfs      81082 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0f6779d0-3a2e-4a94-be9b-d9d6710a7ea0

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/107b3b74-634a-462c-bf40-1d4886117aa9

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/18a538c6-d40e-48c0-a965-d65be407a124

-rw-r--r--   3 user hdfs      83550 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/24ed9c4a-0b8e-45d4-95b8-64547cb9c541

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/35ee9665-7c1f-4407-beb5-fbb312d84907

-rw-r--r--   3 user hdfs      47997 2021-01-04 11:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/36363172-c401-4d60-a970-cfb2b3cbf058

-rw-r--r--   3 user hdfs      81082 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/43aecc8c-145f-43ba-81a8-b0ce2c3498f4

-rw-r--r--   3 user hdfs      79898 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/5743f278-fc50-4c4a-b14e-89bfdb2139fa

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/67e16688-c48c-409b-acac-e7091a84d548

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/773ef43d-936a-4f33-9b0a-d3ff090637c7

-rw-r--r--   3 user hdfs      82046 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/81ac58ef-8810-4fa6-ad8f-a5ec0c0cc885

-rw-r--r--   3 user hdfs      86089 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/8e202c6a-f702-487b-bd00-43739a8c79a2

-rw-r--r--   3 user hdfs      84875 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/a6d4db40-2efe-495c-8e94-a9c31876e4d3

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/b54c5d30-b152-4fba-b0ac-dba598c93646

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/c36433cf-9e79-46ee-a93f-fe042e3c583f

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/e8a27366-4764-4ef0-ae6b-85ed936f6935

-rw-r--r--   3 user hdfs      80747 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/eb6476de-1e35-4d0c-bc6b-2f3214abfffd

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/efd13c04-cbac-4c68-a132-1f9dc9afc7b4

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/f63ba16a-6664-49b6-878f-efba342270be


And resuming from a checkpoint directory (e.g. /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954) is perfectly working as wished.

So I'm wondering
- is every checkpoint already meant to have its metadata on HDFS even without setting the value of "execution-checkpointing-externalized-checkpoint-retention"?
- is setting "execution-checkpointing-externalized-checkpoint-retention" only needed when I want to retain checkpoints in case a job fails or is intentionally cancelled?


Best,

Dongwon
Reply | Threaded
Open this post in threaded view
|

Re: Is chk-$id/_metadata created regardless of enabling externalized checkpoints?

Dongwon Kim-2
Yun,
I just checked that it worked. 
Sorry for the confusion (I might modify flink-conf.yaml on a different location..T.T)

Best,

Dongwon


On Tue, Jan 5, 2021 at 3:38 PM Yun Tang <[hidden email]> wrote:
Hi Dongwon,

What's the actual setting of this option? Setting the 'execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION' should work. 
This is verified in tests and I also confirm this in my submitted jobs.

Best
Yun Tang



From: Dongwon Kim <[hidden email]>
Sent: Tuesday, January 5, 2021 1:46
To: Yun Gao <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Is chk-$id/_metadata created regardless of enabling externalized checkpoints?
 
Thanks Yun for explanation :) it really helps a lot. 

A related question is how I can enable externalized checkpoint in flink-conf.yaml?

It seems like setting "execution-checkpointing-externalized-checkpoint-retention" to RETAIN_ON_CANCELLATION or DELETE_ON_CANCELLATION on flink-conf.yaml is not enough.
The final row shows that it is not enabled even after setting it to either one (FYI, I'm using Flink-1.12.0).
image.png
Even setting it to RETAIN_ON_CANCELLATION, I found that a cancelled job cleans up all its checkpoints on HDFS, which is against the definition of RETAIN_ON_CANCELLATION.

So I have to add the following lines in my Flink application:
CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
Now the external checkpoint seems to be enabled:
image.png

Does it have nothing to do with FLIP-59? Am I missing something or is it a bug?

Thanks, 

Dongwon

On Tue, Jan 5, 2021 at 12:04 AM Yun Gao <[hidden email]> wrote:
Hi Dongwon,

   Happy new year! One meta file would be stored on top of HDFS even if external-checkpoint is not enabled. If external checkpoint is not enabled, flink would delete all the checkpoints on exit, and if external checkpoint is enabled, the checkpoints would be kept on cancel or fail cases, according to the settings. Thus for the second issue, I think it would be yes.

Best,
 Yun

------------------Original Mail ------------------
Sender:Dongwon Kim <[hidden email]>
Send Date:Mon Jan 4 19:16:39 2021
Recipients:user <[hidden email]>
Subject:Is chk-$id/_metadata created regardless of enabling externalized checkpoints?
Hi,

First of all, happy new year!
It can be a very basic question but I have something to clarify in my head.

my flink-conf.yaml is as follows (note that I didn't specify the value of "execution-checkpointing-externalized-checkpoint-retention [1]"):

#...

execution.checkpointing.interval: 20min

execution.checkpointing.min-pause: 1min


state.backend: rocksdb

state.backend.incremental: true


state.checkpoints.dir: hdfs:///flink-jobs/ckpts

state.checkpoints.num-retained: 10


state.savepoints.dir: hdfs:///flink-jobs/svpts

#...


And the checkpoint configuration is shown as follows in Web UI (note that "Persist Checkpoints Externally" is "Disabled" in the final row):
image.png

According to [2],
  • externalized checkpoints: You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails. There are more details in the deployment notes on externalized checkpoints.

So I've thought the metadata of a checkpoint is only on JobManager's memory and not stored on HDFS unless "execution-checkpointing-externalized-checkpoint-retention" is set.

However, even without setting the value, every checkpoint already contains its own metadata:

[user@devflink conf]$ hdfs dfs -ls /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/*

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-945/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163281 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-946/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     163157 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-947/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     156684 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-948/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-949/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     147280 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-950/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     162937 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-951/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     175089 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-952/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     173289 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-953/_metadata

Found 1 items

-rw-r--r--   3 user hdfs     153951 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954/_metadata

Found 21 items

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/05d76f4e-3d9c-420c-8b87-077fc9880d9a

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0b9d9323-9f10-4fc2-8fcc-a9326448b07c

-rw-r--r--   3 user hdfs      81082 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/0f6779d0-3a2e-4a94-be9b-d9d6710a7ea0

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/107b3b74-634a-462c-bf40-1d4886117aa9

-rw-r--r--   3 user hdfs      78748 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/18a538c6-d40e-48c0-a965-d65be407a124

-rw-r--r--   3 user hdfs      83550 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/24ed9c4a-0b8e-45d4-95b8-64547cb9c541

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/35ee9665-7c1f-4407-beb5-fbb312d84907

-rw-r--r--   3 user hdfs      47997 2021-01-04 11:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/36363172-c401-4d60-a970-cfb2b3cbf058

-rw-r--r--   3 user hdfs      81082 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/43aecc8c-145f-43ba-81a8-b0ce2c3498f4

-rw-r--r--   3 user hdfs      79898 2021-01-04 15:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/5743f278-fc50-4c4a-b14e-89bfdb2139fa

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/67e16688-c48c-409b-acac-e7091a84d548

-rw-r--r--   3 user hdfs      23905 2021-01-04 16:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/773ef43d-936a-4f33-9b0a-d3ff090637c7

-rw-r--r--   3 user hdfs      82046 2021-01-04 16:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/81ac58ef-8810-4fa6-ad8f-a5ec0c0cc885

-rw-r--r--   3 user hdfs      86089 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/8e202c6a-f702-487b-bd00-43739a8c79a2

-rw-r--r--   3 user hdfs      84875 2021-01-04 17:05 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/a6d4db40-2efe-495c-8e94-a9c31876e4d3

-rw-r--r--   3 user hdfs      23905 2021-01-04 17:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/b54c5d30-b152-4fba-b0ac-dba598c93646

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/c36433cf-9e79-46ee-a93f-fe042e3c583f

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/e8a27366-4764-4ef0-ae6b-85ed936f6935

-rw-r--r--   3 user hdfs      80747 2021-01-04 15:25 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/eb6476de-1e35-4d0c-bc6b-2f3214abfffd

-rw-r--r--   3 user hdfs      23905 2021-01-04 15:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/efd13c04-cbac-4c68-a132-1f9dc9afc7b4

-rw-r--r--   3 user hdfs      23905 2021-01-04 14:45 /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/shared/f63ba16a-6664-49b6-878f-efba342270be


And resuming from a checkpoint directory (e.g. /flink-jobs/ckpts/76fc265c44ef44ae343ab15868155de6/chk-954) is perfectly working as wished.

So I'm wondering
- is every checkpoint already meant to have its metadata on HDFS even without setting the value of "execution-checkpointing-externalized-checkpoint-retention"?
- is setting "execution-checkpointing-externalized-checkpoint-retention" only needed when I want to retain checkpoints in case a job fails or is intentionally cancelled?


Best,

Dongwon