minPauseBetweenCheckpoints for failed checkpoints

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

minPauseBetweenCheckpoints for failed checkpoints

Dmitry Minaev-2
Hello!

I have a question regarding checkpointing parameter minPauseBetweenCheckpoints that is the minimal pause between checkpointing attempts.
I’ve noticed the following (strange) behavior in Flink.

I set the following parameters for a sample Flink job:

Checkpointing Mode = Exactly Once
Interval = 10s
Timeout = 30s
Minimum Pause Between Checkpoints = 15s
Maximum Concurrent Checkpoints = 1
Persist Checkpoints Externally = Disabled

Then I started the job that intentionally makes some of the checkpoints fail by timeout.
I noticed that this parameter minPauseBetweenCheckpoints is taken into consideration by Flink only when checkpoint doesn’t fail by timeout:

My first checkpoint triggered at 18:03:11 and failed within expected 30 seconds. But immediately after that, a new checkpoint was triggered at 18:03:41. It doesn’t make sense to me since I’m using a minPauseBetweenCheckpoints = 15 seconds. I would expect Flink to wait for 15 seconds before starting a new checkpoint.

However, it seems like this minPauseBetweenCheckpoints works as expected for checkpoints that completed successfully within configured interval. For example, my 4th checkpoint started at 18:04:41 and completed at 18:04:56. And the next checkpoint waited another 15 seconds to start at 18:05:11.

Please see attached screenshots for configuration and checkpoint history.

My question is – is it an expected behavior or a bug? Is there a way to have a pause between checkpoints even if checkpoint fails by timeout?

Thank you!

--
Kind regards,
Dmitry Minaev



CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.

Screen Shot 2018-05-14 at 6.06.41 PM.png (71K) Download Attachment
Screen Shot 2018-05-14 at 6.06.24 PM.png (67K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: minPauseBetweenCheckpoints for failed checkpoints

Timo Walther
Hi Dmitry,

I think the minPauseBetweenCheckpoints is intended for pausing between successful checkpoints. Usually a user wants to get a successful checkpoint as quickly as possible again. Stefan (in CC) might know more about.

Regards,
Timo

Am 15.05.18 um 03:28 schrieb Dmitry Minaev:
Hello!

I have a question regarding checkpointing parameter minPauseBetweenCheckpoints that is the minimal pause between checkpointing attempts.
I’ve noticed the following (strange) behavior in Flink.

I set the following parameters for a sample Flink job:

Checkpointing Mode = Exactly Once
Interval = 10s
Timeout = 30s
Minimum Pause Between Checkpoints = 15s
Maximum Concurrent Checkpoints = 1
Persist Checkpoints Externally = Disabled

Then I started the job that intentionally makes some of the checkpoints fail by timeout.
I noticed that this parameter minPauseBetweenCheckpoints is taken into consideration by Flink only when checkpoint doesn’t fail by timeout:

My first checkpoint triggered at 18:03:11 and failed within expected 30 seconds. But immediately after that, a new checkpoint was triggered at 18:03:41. It doesn’t make sense to me since I’m using a minPauseBetweenCheckpoints = 15 seconds. I would expect Flink to wait for 15 seconds before starting a new checkpoint.

However, it seems like this minPauseBetweenCheckpoints works as expected for checkpoints that completed successfully within configured interval. For example, my 4th checkpoint started at 18:04:41 and completed at 18:04:56. And the next checkpoint waited another 15 seconds to start at 18:05:11.

Please see attached screenshots for configuration and checkpoint history.

My question is – is it an expected behavior or a bug? Is there a way to have a pause between checkpoints even if checkpoint fails by timeout?

Thank you!

--
Kind regards,
Dmitry Minaev



CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.