Externalized checkpoints

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Externalized checkpoints

Vishwas Siravara
Hi peeps,
I am externalizing checkpoints in S3 for my flink job and I retain them on cancellation. However when I look into my S3 bucket where the checkpoints are stored there is only 1 checkpoint at any point in time . Is this the default behavior of flink where older checkpoints are deleted when the current checkpoint completes ? Here are a few screenshots. What are your thoughts on restoring an older state which is not the previous state ? 

List contents of bucket at time 0
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/6af4f345-49e0-4ae1-baae-1f7c4d71ebf4Last modified time : Wed Aug 21 22:17:23 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/_metadataLast modified time : Wed Aug 21 22:17:24 GMT 2019
List contents of bucket at time 1
Printing last modified times
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/7cf17042-7790-4909-9252-73511d93f518Last modified time : Wed Aug 21 22:23:24 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/_metadataLast modified time : Wed Aug 21 22:23:24 GMT 2019
Thanks,
Vishwas
Reply | Threaded
Open this post in threaded view
|

Re: Externalized checkpoints

Vishwas Siravara
I am also using exactly once checkpointing mode, I have a kafka source and sink so both support transactions which should allow for exactly once processing. Is this the reason why there is only one checkpoint retained ? 

Thanks,
Vishwas 

On Wed, Aug 21, 2019 at 5:26 PM Vishwas Siravara <[hidden email]> wrote:
Hi peeps,
I am externalizing checkpoints in S3 for my flink job and I retain them on cancellation. However when I look into my S3 bucket where the checkpoints are stored there is only 1 checkpoint at any point in time . Is this the default behavior of flink where older checkpoints are deleted when the current checkpoint completes ? Here are a few screenshots. What are your thoughts on restoring an older state which is not the previous state ? 

List contents of bucket at time 0
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/6af4f345-49e0-4ae1-baae-1f7c4d71ebf4Last modified time : Wed Aug 21 22:17:23 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/_metadataLast modified time : Wed Aug 21 22:17:24 GMT 2019
List contents of bucket at time 1
Printing last modified times
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/7cf17042-7790-4909-9252-73511d93f518Last modified time : Wed Aug 21 22:23:24 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/_metadataLast modified time : Wed Aug 21 22:23:24 GMT 2019
Thanks,
Vishwas
Reply | Threaded
Open this post in threaded view
|

Re: Externalized checkpoints

Zhu Zhu
Hi Vishwas,

You can configure "state.checkpoints.num-retained" to specify the max checkpoints to retain.
By default it is 1.

Thanks,
Zhu Zhu

Vishwas Siravara <[hidden email]> 于2019年8月22日周四 上午6:48写道:
I am also using exactly once checkpointing mode, I have a kafka source and sink so both support transactions which should allow for exactly once processing. Is this the reason why there is only one checkpoint retained ? 

Thanks,
Vishwas 

On Wed, Aug 21, 2019 at 5:26 PM Vishwas Siravara <[hidden email]> wrote:
Hi peeps,
I am externalizing checkpoints in S3 for my flink job and I retain them on cancellation. However when I look into my S3 bucket where the checkpoints are stored there is only 1 checkpoint at any point in time . Is this the default behavior of flink where older checkpoints are deleted when the current checkpoint completes ? Here are a few screenshots. What are your thoughts on restoring an older state which is not the previous state ? 

List contents of bucket at time 0
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/6af4f345-49e0-4ae1-baae-1f7c4d71ebf4Last modified time : Wed Aug 21 22:17:23 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/_metadataLast modified time : Wed Aug 21 22:17:24 GMT 2019
List contents of bucket at time 1
Printing last modified times
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/7cf17042-7790-4909-9252-73511d93f518Last modified time : Wed Aug 21 22:23:24 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/_metadataLast modified time : Wed Aug 21 22:23:24 GMT 2019
Thanks,
Vishwas
Reply | Threaded
Open this post in threaded view
|

Re: Externalized checkpoints

Congxian Qiu
Hi, Vishwas

As Zhu Zhu said, you can set "state.checkpoints.num-retained"[1] to specify the maximum number of completed checkpoints to retain.
maybe you can also ref the external checkpoint cleanup type[2] config for how to clean up the retained checkpoint[2]


Zhu Zhu <[hidden email]> 于2019年8月22日周四 上午10:13写道:
Hi Vishwas,

You can configure "state.checkpoints.num-retained" to specify the max checkpoints to retain.
By default it is 1.

Thanks,
Zhu Zhu

Vishwas Siravara <[hidden email]> 于2019年8月22日周四 上午6:48写道:
I am also using exactly once checkpointing mode, I have a kafka source and sink so both support transactions which should allow for exactly once processing. Is this the reason why there is only one checkpoint retained ? 

Thanks,
Vishwas 

On Wed, Aug 21, 2019 at 5:26 PM Vishwas Siravara <[hidden email]> wrote:
Hi peeps,
I am externalizing checkpoints in S3 for my flink job and I retain them on cancellation. However when I look into my S3 bucket where the checkpoints are stored there is only 1 checkpoint at any point in time . Is this the default behavior of flink where older checkpoints are deleted when the current checkpoint completes ? Here are a few screenshots. What are your thoughts on restoring an older state which is not the previous state ? 

List contents of bucket at time 0
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/6af4f345-49e0-4ae1-baae-1f7c4d71ebf4Last modified time : Wed Aug 21 22:17:23 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/_metadataLast modified time : Wed Aug 21 22:17:24 GMT 2019
List contents of bucket at time 1
Printing last modified times
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/7cf17042-7790-4909-9252-73511d93f518Last modified time : Wed Aug 21 22:23:24 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/_metadataLast modified time : Wed Aug 21 22:23:24 GMT 2019
Thanks,
Vishwas
Reply | Threaded
Open this post in threaded view
|

Re: Externalized checkpoints

Vishwas Siravara
Got it.Thank you

On Thu, Aug 22, 2019 at 8:54 PM Congxian Qiu <[hidden email]> wrote:
Hi, Vishwas

As Zhu Zhu said, you can set "state.checkpoints.num-retained"[1] to specify the maximum number of completed checkpoints to retain.
maybe you can also ref the external checkpoint cleanup type[2] config for how to clean up the retained checkpoint[2]


Zhu Zhu <[hidden email]> 于2019年8月22日周四 上午10:13写道:
Hi Vishwas,

You can configure "state.checkpoints.num-retained" to specify the max checkpoints to retain.
By default it is 1.

Thanks,
Zhu Zhu

Vishwas Siravara <[hidden email]> 于2019年8月22日周四 上午6:48写道:
I am also using exactly once checkpointing mode, I have a kafka source and sink so both support transactions which should allow for exactly once processing. Is this the reason why there is only one checkpoint retained ? 

Thanks,
Vishwas 

On Wed, Aug 21, 2019 at 5:26 PM Vishwas Siravara <[hidden email]> wrote:
Hi peeps,
I am externalizing checkpoints in S3 for my flink job and I retain them on cancellation. However when I look into my S3 bucket where the checkpoints are stored there is only 1 checkpoint at any point in time . Is this the default behavior of flink where older checkpoints are deleted when the current checkpoint completes ? Here are a few screenshots. What are your thoughts on restoring an older state which is not the previous state ? 

List contents of bucket at time 0
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/6af4f345-49e0-4ae1-baae-1f7c4d71ebf4Last modified time : Wed Aug 21 22:17:23 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-6/_metadataLast modified time : Wed Aug 21 22:17:24 GMT 2019
List contents of bucket at time 1
Printing last modified times
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/7cf17042-7790-4909-9252-73511d93f518Last modified time : Wed Aug 21 22:23:24 GMT 2019
Object Name: checkpoints/fb9fea316bf2d530a6fc54ea107d66d4/chk-12/_metadataLast modified time : Wed Aug 21 22:23:24 GMT 2019
Thanks,
Vishwas