My Flink job failed to checkpoint with a "The job has failed" error. The logs contained no other recent errors. I keep hitting the error even if I cancel the jobs and restart them. When I restarted my jobmanager and taskmanager, the error went away.
What error am I hitting? It looks like there is bad state that lives outside the scope of a job. How often do people restart their jobmanagers and taskmanager to deal with errors like this? |
Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill <[hidden email]> wrote:
|
Hi Dan,
I think you might use older version of Flink and this problem has been resolved by FLINK-16753 [1] after Flink-1.10.3.
Best
Yun Tang
From: Robert Metzger <[hidden email]>
Sent: Monday, April 26, 2021 14:46 To: Dan Hill <[hidden email]> Cc: user <[hidden email]> Subject: Re: Checkpoint error - "The job has failed" Hi Dan,
can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using)
On Mon, Apr 26, 2021 at 7:20 AM Dan Hill <[hidden email]> wrote:
|
Hey Yun and Robert, I'm using Flink v1.11.1. Robert, I'll send you a separate email with the logs. On Mon, Apr 26, 2021 at 12:46 AM Yun Tang <[hidden email]> wrote:
|
Hi Dan,
You could refer to the "Fix Versions" in FLINK-16753 [1] and know that this bug is resolved after 1.11.3 not 1.11.1.
Best
Yun Tang
From: Dan Hill <[hidden email]>
Sent: Tuesday, April 27, 2021 7:50 To: Yun Tang <[hidden email]> Cc: Robert Metzger <[hidden email]>; user <[hidden email]> Subject: Re: Checkpoint error - "The job has failed" Hey Yun and Robert,
I'm using Flink v1.11.1.
Robert, I'll send you a separate email with the logs.
On Mon, Apr 26, 2021 at 12:46 AM Yun Tang <[hidden email]> wrote:
|
Oh interesting. Yea, could be. We'll soon update to v1.12. Thanks Robert and Yun! On Wed, Apr 28, 2021 at 1:30 AM Yun Tang <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |