Hi Alex,
First thing to do in such cases is to analyze logs for jobmanager
and taskmanagers and look for exceptions there.
The cause for latest failed checkpoint says the checkpoint
expired. You can try increasing the checkpoint timeout (you can
check more configuration options for checkpoints here [1]).
Best,
Dawid
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing
On 21/08/18 09:10, Alexander Smirnov
wrote:
Hello,
I have a cluster with multiple jobs running on it. One of
the jobs has checkpoints constantly failing
How do I investigate it?
Thank you,
Alex