External checkpoint metadata in Flink 1.5.x

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

External checkpoint metadata in Flink 1.5.x

Gyula Fóra
Hi,
It seems that the behaviour to store the checkpoint metadata files for externalized checkpoints changed from 1.4 to 1.5 and the docs seem to be incorrectly saying that: 
"state.checkpoints.dir: The target directory for meta data of externalized checkpoints"

Now the external metadata files simply end up in the checkpoint dir as configured in the state backend so basically the "state.checkpoints.dir" param is overwritten by the job. This breaks a bunch of tooling around checkpoints that rely on a common place for metadata files instead of scattered around in subdirectories. I wonder is there any way to get the previous behaviour back?

Thanks,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: External checkpoint metadata in Flink 1.5.x

Aljoscha Krettek
Hi,

does that mean https://issues.apache.org/jira/browse/FLINK-5627 is no longer relevant for you, since it seems to request the behaviour that we have now?

But yes, I think it's currently not possible (with out-of-box functionality) to write externalized-checkpoint metadata to a central location. There is this Jira issue which aims at implementing a solution: https://issues.apache.org/jira/browse/FLINK-9114.

I quickly talked to Stephan, it seems to be that the meta info about externalized checkpoints is also written to the HA storage directory, maybe that's helpful for you.

Best,
Aljoscha

On 12. Jul 2018, at 14:18, Gyula Fóra <[hidden email]> wrote:

Hi,
It seems that the behaviour to store the checkpoint metadata files for externalized checkpoints changed from 1.4 to 1.5 and the docs seem to be incorrectly saying that: 
"state.checkpoints.dir: The target directory for meta data of externalized checkpoints"

Now the external metadata files simply end up in the checkpoint dir as configured in the state backend so basically the "state.checkpoints.dir" param is overwritten by the job. This breaks a bunch of tooling around checkpoints that rely on a common place for metadata files instead of scattered around in subdirectories. I wonder is there any way to get the previous behaviour back?

Thanks,
Gyula

Reply | Threaded
Open this post in threaded view
|

Re: External checkpoint metadata in Flink 1.5.x

Gyula Fóra
Hi!

Well it depends on how we look at it FLINK-5627  is not necessarily the current behaviour. You still can't really specify the exact location from within the job as it now goes to a checkpoint specific place determined by the checkpoint dir. I will close the jira as 9114 covers it in a more generic way.

I can probably work around the current behaviour in the meantime :)

Cheers,
Gyula

Aljoscha Krettek <[hidden email]> ezt írta (időpont: 2018. júl. 12., Cs, 17:35):
Hi,

does that mean https://issues.apache.org/jira/browse/FLINK-5627 is no longer relevant for you, since it seems to request the behaviour that we have now?

But yes, I think it's currently not possible (with out-of-box functionality) to write externalized-checkpoint metadata to a central location. There is this Jira issue which aims at implementing a solution: https://issues.apache.org/jira/browse/FLINK-9114.

I quickly talked to Stephan, it seems to be that the meta info about externalized checkpoints is also written to the HA storage directory, maybe that's helpful for you.

Best,
Aljoscha


On 12. Jul 2018, at 14:18, Gyula Fóra <[hidden email]> wrote:

Hi,
It seems that the behaviour to store the checkpoint metadata files for externalized checkpoints changed from 1.4 to 1.5 and the docs seem to be incorrectly saying that: 
"state.checkpoints.dir: The target directory for meta data of externalized checkpoints"

Now the external metadata files simply end up in the checkpoint dir as configured in the state backend so basically the "state.checkpoints.dir" param is overwritten by the job. This breaks a bunch of tooling around checkpoints that rely on a common place for metadata files instead of scattered around in subdirectories. I wonder is there any way to get the previous behaviour back?

Thanks,
Gyula

Reply | Threaded
Open this post in threaded view
|

Re: External checkpoint metadata in Flink 1.5.x

Aljoscha Krettek
Out of curiosity, how will you work around it? And how is it easier for your tooling if checkpoints are in a central location?

Best,
Aljoscha

On 12. Jul 2018, at 17:55, Gyula Fóra <[hidden email]> wrote:

Hi!

Well it depends on how we look at it FLINK-5627  is not necessarily the current behaviour. You still can't really specify the exact location from within the job as it now goes to a checkpoint specific place determined by the checkpoint dir. I will close the jira as 9114 covers it in a more generic way.

I can probably work around the current behaviour in the meantime :)

Cheers,
Gyula

Aljoscha Krettek <[hidden email]> ezt írta (időpont: 2018. júl. 12., Cs, 17:35):
Hi,

does that mean https://issues.apache.org/jira/browse/FLINK-5627 is no longer relevant for you, since it seems to request the behaviour that we have now?

But yes, I think it's currently not possible (with out-of-box functionality) to write externalized-checkpoint metadata to a central location. There is this Jira issue which aims at implementing a solution: https://issues.apache.org/jira/browse/FLINK-9114.

I quickly talked to Stephan, it seems to be that the meta info about externalized checkpoints is also written to the HA storage directory, maybe that's helpful for you.

Best,
Aljoscha


On 12. Jul 2018, at 14:18, Gyula Fóra <[hidden email]> wrote:

Hi,
It seems that the behaviour to store the checkpoint metadata files for externalized checkpoints changed from 1.4 to 1.5 and the docs seem to be incorrectly saying that: 
"state.checkpoints.dir: The target directory for meta data of externalized checkpoints"

Now the external metadata files simply end up in the checkpoint dir as configured in the state backend so basically the "state.checkpoints.dir" param is overwritten by the job. This breaks a bunch of tooling around checkpoints that rely on a common place for metadata files instead of scattered around in subdirectories. I wonder is there any way to get the previous behaviour back?

Thanks,
Gyula


Reply | Threaded
Open this post in threaded view
|

Re: External checkpoint metadata in Flink 1.5.x

Gyula Fóra
Hi,
To be fair my "workaround" is to restore the previous behaviour in our Flink build as it only takes a few lines of code :)

To explain why this is convenient I can describe how we use the checkpoints/savepoints.

For every Streaming Application (not flink job) we have a unique name from which the savepoint directory is derived. This means that all savepoints + external metadata pointers should end up in this directory and we can easily pick out the latest for the current application when we want to restore.

This is extremely convenient compared to crawling through a bunch of nested directories to find the latest and also makes migrating checkpoint directories and such things more straightforward.

Cheers,
Gyula

Aljoscha Krettek <[hidden email]> ezt írta (időpont: 2018. júl. 13., P, 8:29):
Out of curiosity, how will you work around it? And how is it easier for your tooling if checkpoints are in a central location?

Best,
Aljoscha


On 12. Jul 2018, at 17:55, Gyula Fóra <[hidden email]> wrote:

Hi!

Well it depends on how we look at it FLINK-5627  is not necessarily the current behaviour. You still can't really specify the exact location from within the job as it now goes to a checkpoint specific place determined by the checkpoint dir. I will close the jira as 9114 covers it in a more generic way.

I can probably work around the current behaviour in the meantime :)

Cheers,
Gyula

Aljoscha Krettek <[hidden email]> ezt írta (időpont: 2018. júl. 12., Cs, 17:35):
Hi,

does that mean https://issues.apache.org/jira/browse/FLINK-5627 is no longer relevant for you, since it seems to request the behaviour that we have now?

But yes, I think it's currently not possible (with out-of-box functionality) to write externalized-checkpoint metadata to a central location. There is this Jira issue which aims at implementing a solution: https://issues.apache.org/jira/browse/FLINK-9114.

I quickly talked to Stephan, it seems to be that the meta info about externalized checkpoints is also written to the HA storage directory, maybe that's helpful for you.

Best,
Aljoscha


On 12. Jul 2018, at 14:18, Gyula Fóra <[hidden email]> wrote:

Hi,
It seems that the behaviour to store the checkpoint metadata files for externalized checkpoints changed from 1.4 to 1.5 and the docs seem to be incorrectly saying that: 
"state.checkpoints.dir: The target directory for meta data of externalized checkpoints"

Now the external metadata files simply end up in the checkpoint dir as configured in the state backend so basically the "state.checkpoints.dir" param is overwritten by the job. This breaks a bunch of tooling around checkpoints that rely on a common place for metadata files instead of scattered around in subdirectories. I wonder is there any way to get the previous behaviour back?

Thanks,
Gyula