How do I investigate checkpoints failures

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How do I investigate checkpoints failures

Alexander Smirnov
Hello,

I have a cluster with multiple jobs running on it. One of the jobs has checkpoints constantly failing
image.png

How do I investigate it? 

Thank you,
Alex
Reply | Threaded
Open this post in threaded view
|

Re: How do I investigate checkpoints failures

Dawid Wysakowicz-2

Hi Alex,

First thing to do in such cases is to analyze logs for jobmanager and taskmanagers and look for exceptions there.

The cause for latest failed checkpoint says the checkpoint expired. You can try increasing the checkpoint timeout (you can check more configuration options for checkpoints here [1]).

Best,

Dawid


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing


On 21/08/18 09:10, Alexander Smirnov wrote:
Hello,

I have a cluster with multiple jobs running on it. One of the jobs has checkpoints constantly failing
image.png

How do I investigate it? 

Thank you,
Alex


signature.asc (849 bytes) Download Attachment