Checkpoint not maintaining minimum pause duration between checkpoints

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpoint not maintaining minimum pause duration between checkpoints

Ravi Bhushan Ratnakar
Hi All,

I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
Interval - 3 minutes
Minimum pause between checkpoints - 3 minutes
Checkpoint timeout - 10 minutes
Checkpointing Mode - Exactly Once

Other configs
Time Characteristics - Processing Time

I am observing an usual behaviour. When a checkpoint completes successfully and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot

image.png

Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Checkpoint not maintaining minimum pause duration between checkpoints

Congxian Qiu
Hi Ravi 
    What's about the value of `concurrent checkpoints`. If the value of `concurent checkpoints` is 1, then I think the behavior is problematic, as the Javadoc of `CheckpointConfig` said that `If the max number of concurrent checkpoints is et to one, this setting makes effectively sure that a minimum amount of time passes where no checkpoint is in progress at all.`

Best,
Congxian


Ravi Bhushan Ratnakar <[hidden email]> 于2020年7月21日周二 下午5:25写道:
Hi All,

I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
Interval - 3 minutes
Minimum pause between checkpoints - 3 minutes
Checkpoint timeout - 10 minutes
Checkpointing Mode - Exactly Once

Other configs
Time Characteristics - Processing Time

I am observing an usual behaviour. When a checkpoint completes successfully and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot

image.png

Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Checkpoint not maintaining minimum pause duration between checkpoints

Xiangyu Su
Hi Congxian,
Thank you for your response, I am Ravi's colleague, and we are using 1 to 'maxConcurrentCheckpoints' if you mean that.

Best,
Xiangyu

On Wed, 22 Jul 2020 at 08:17, Congxian Qiu <[hidden email]> wrote:
Hi Ravi 
    What's about the value of `concurrent checkpoints`. If the value of `concurent checkpoints` is 1, then I think the behavior is problematic, as the Javadoc of `CheckpointConfig` said that `If the max number of concurrent checkpoints is et to one, this setting makes effectively sure that a minimum amount of time passes where no checkpoint is in progress at all.`

Best,
Congxian


Ravi Bhushan Ratnakar <[hidden email]> 于2020年7月21日周二 下午5:25写道:
Hi All,

I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
Interval - 3 minutes
Minimum pause between checkpoints - 3 minutes
Checkpoint timeout - 10 minutes
Checkpointing Mode - Exactly Once

Other configs
Time Characteristics - Processing Time

I am observing an usual behaviour. When a checkpoint completes successfully and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot

image.png

Regards,
Ravi


--
Xiangyu Su
Java Developer
[hidden email]

Smaato Inc.
San Francisco - New York - Hamburg - Singapore
www.smaato.com

Germany:
Valentinskamp 70, Emporio, 19th Floor
20355 Hamburg
M 0049(176)22943076

The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
Reply | Threaded
Open this post in threaded view
|

Re: Checkpoint not maintaining minimum pause duration between checkpoints

Ravi Bhushan Ratnakar
Hi Congxian,

Thanks for your reply. As my colleague Xiangyu has already share you that we are using 1 for max concurrent checkpointing.

I have created a jira ticket FLINK-18675 for the same. 


Regards, 
Ravi 

On Wed 22 Jul, 2020, 10:18 Xiangyu Su, <[hidden email]> wrote:
Hi Congxian,
Thank you for your response, I am Ravi's colleague, and we are using 1 to 'maxConcurrentCheckpoints' if you mean that.

Best,
Xiangyu

On Wed, 22 Jul 2020 at 08:17, Congxian Qiu <[hidden email]> wrote:
Hi Ravi 
    What's about the value of `concurrent checkpoints`. If the value of `concurent checkpoints` is 1, then I think the behavior is problematic, as the Javadoc of `CheckpointConfig` said that `If the max number of concurrent checkpoints is et to one, this setting makes effectively sure that a minimum amount of time passes where no checkpoint is in progress at all.`

Best,
Congxian


Ravi Bhushan Ratnakar <[hidden email]> 于2020年7月21日周二 下午5:25写道:
Hi All,

I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
Interval - 3 minutes
Minimum pause between checkpoints - 3 minutes
Checkpoint timeout - 10 minutes
Checkpointing Mode - Exactly Once

Other configs
Time Characteristics - Processing Time

I am observing an usual behaviour. When a checkpoint completes successfully and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot

image.png

Regards,
Ravi


--
Xiangyu Su
Java Developer
[hidden email]

Smaato Inc.
San Francisco - New York - Hamburg - Singapore
www.smaato.com

Germany:
Valentinskamp 70, Emporio, 19th Floor
20355 Hamburg
M 0049(176)22943076

The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
Reply | Threaded
Open this post in threaded view
|

Re: Checkpoint not maintaining minimum pause duration between checkpoints

Congxian Qiu
Hi Ravi

Thanks for creating the issue, we can discuss it at the Jira side.

Best,
Congxian


Ravi Bhushan Ratnakar <[hidden email]> 于2020年7月23日周四 上午12:52写道:
Hi Congxian,

Thanks for your reply. As my colleague Xiangyu has already share you that we are using 1 for max concurrent checkpointing.

I have created a jira ticket FLINK-18675 for the same. 


Regards, 
Ravi 

On Wed 22 Jul, 2020, 10:18 Xiangyu Su, <[hidden email]> wrote:
Hi Congxian,
Thank you for your response, I am Ravi's colleague, and we are using 1 to 'maxConcurrentCheckpoints' if you mean that.

Best,
Xiangyu

On Wed, 22 Jul 2020 at 08:17, Congxian Qiu <[hidden email]> wrote:
Hi Ravi 
    What's about the value of `concurrent checkpoints`. If the value of `concurent checkpoints` is 1, then I think the behavior is problematic, as the Javadoc of `CheckpointConfig` said that `If the max number of concurrent checkpoints is et to one, this setting makes effectively sure that a minimum amount of time passes where no checkpoint is in progress at all.`

Best,
Congxian


Ravi Bhushan Ratnakar <[hidden email]> 于2020年7月21日周二 下午5:25写道:
Hi All,

I am running a streaming job with Flink 1.11.0 using kubernetes infrastructure. I have configured checkpoint configuration like below
Interval - 3 minutes
Minimum pause between checkpoints - 3 minutes
Checkpoint timeout - 10 minutes
Checkpointing Mode - Exactly Once

Other configs
Time Characteristics - Processing Time

I am observing an usual behaviour. When a checkpoint completes successfully and if it's end to end duration is almost equal or greater than Minimum pause duration then the next checkpoint gets triggered immediately without maintaining the Minimum pause duration. Kindly notice this behaviour from checkpoint id 194 onward in the attached screenshot

image.png

Regards,
Ravi


--
Xiangyu Su
Java Developer
[hidden email]

Smaato Inc.
San Francisco - New York - Hamburg - Singapore
www.smaato.com

Germany:
Valentinskamp 70, Emporio, 19th Floor
20355 Hamburg
M 0049(176)22943076

The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.