Hi,
What Flink version are you using and what is the scenario that's happening? It can be a number of things, most likely an issue that your filed mounted under:
> /mnt/checkpoints/5dde50b6e70608c63708cbf979bce4aa/shared/47993871-c7eb-4fec-ae23-207d307c384a
disappeared or stopped being accessible. For example something like this [1] (this is not a Flink bug).
Have you tried looking for this path manually? Does it exist? Have you looked in the JobManager/TaskManager logs for all entries that are referring to this path?
To help you, we would need more information. If it has happened after taking a savepoint this could be a recently fixed issue [2]. If you are using SQL (Blink planner) it could be for example this [3].
Piotrek
Hello,
I executed a flink job in a Kubernetes Application cluster w/ four taskmanagers. The job was running fine for several hours but then crashed w/ the following exception which seems to be when restoring from a checkpoint. The UI shows the following for the checkpoint counts:
Triggered: 68In Progress: 0Completed: 67Failed: 1Restored: 292
Any ideas about this failure?
Thanks