Re: Checkpoint fail due to timeout

Posted by Roman Khachatryan on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Checkpoint-fail-due-to-timeout-tp42125p42179.html

Hello,

This can be caused by several reasons such as back-pressure, large
snapshots or bugs.

Could you please share:
- the stats of the previous (successful) checkpoints
- back-pressure metrics for sources
- which Flink version do you use?

Regards,
Roman


On Thu, Mar 11, 2021 at 7:03 AM Alexey Trenikhun <[hidden email]> wrote:
>
> Hello,
> We are experiencing the problem with checkpoints failing due to timeout (already set to 30 minute, still failing), checkpoints were not too big before they started to fail, around 1.2Gb. Looks like one of sources (Kafka) never acknowledged (see attached screenshot). What could be the reason?
>
> Thanks,
> Alexey
>
>