(DEPRECATED) Apache Flink User Mailing List archive.

Flink failing to restore from checkpoint

Classic

List

Threaded

2 messages Options

Claude Murad

Flink failing to restore from checkpoint

Hello,

I executed a flink job in a Kubernetes Application cluster w/ four taskmanagers. The job was running fine for several hours but then crashed w/ the following exception which seems to be when restoring from a checkpoint. The UI shows the following for the checkpoint counts:

Triggered: 68In Progress: 0Completed: 67Failed: 1Restored: 292

Any ideas about this failure?

Thanks

FlinkCheckpointFailure.txt (6K) Download Attachment

Piotr Nowojski-4

Re: Flink failing to restore from checkpoint

Hi,

What Flink version are you using and what is the scenario that's happening? It can be a number of things, most likely an issue that your filed mounted under:

> /mnt/checkpoints/5dde50b6e70608c63708cbf979bce4aa/shared/47993871-c7eb-4fec-ae23-207d307c384a

disappeared or stopped being accessible. For example something like this [1] (this is not a Flink bug).

Have you tried looking for this path manually? Does it exist? Have you looked in the JobManager/TaskManager logs for all entries that are referring to this path?

To help you, we would need more information. If it has happened after taking a savepoint this could be a recently fixed issue [2]. If you are using SQL (Blink planner) it could be for example this [3].

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-16470

[2] https://issues.apache.org/jira/browse/FLINK-21351

[3] https://issues.apache.org/jira/browse/FLINK-20665

pon., 29 mar 2021 o 14:58 Claude M <[hidden email]> napisał(a):

Hello,

I executed a flink job in a Kubernetes Application cluster w/ four taskmanagers. The job was running fine for several hours but then crashed w/ the following exception which seems to be when restoring from a checkpoint. The UI shows the following for the checkpoint counts:

Triggered: 68In Progress: 0Completed: 67Failed: 1Restored: 292

Any ideas about this failure?

Thanks