Re: Making job fail on Checkpoint Expired?

Posted by Congxian Qiu on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Making-job-fail-on-Checkpoint-Expired-tp34051p34061.html

Currently, only checkpoint declined will be counted into `continuousFailureCounter`.
Could you please share why do you want the job to fail when checkpoint expired?

Best,
Congxian


Timo Walther <[hidden email]> 于2020年4月2日周四 下午11:23写道:
Hi Robin,

this is a very good observation and maybe even unintended behavior.
Maybe Arvid in CC is more familiar with the checkpointing?

Regards,
Timo


On 02.04.20 15:37, Robin Cassan wrote:
> Hi all,
>
> I am wondering if there is a way to make a flink job fail (not cancel
> it) when one or several checkpoints have failed due to being expired
> (taking longer than the timeout) ?
> I am using Flink 1.9.2 and have set
> `*setTolerableCheckpointFailureNumber(1)*` which doesn't do the trick.
> Looking into the CheckpointFailureManager.java class, it looks like this
> only works when the checkpoint failure reason is
> `*CHECKPOINT_DECLINED*`, but the number of failures isn't incremented on
> `*CHECKPOINT_EXPIRED*`.
> Am I missing something?
>
> Thanks!