Savepoint process recovery in Jobmanager HA setup

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view

Savepoint process recovery in Jobmanager HA setup

Bajaj, Abhinav



I am trying to test a scenario that triggers a savepoint on a Flink 1.7.1 Job deployed with jobmanager HA mode. 

The purpose is to check if savepoint process recovers if the leader jobmanager fails during the savepoint.


During my testing, I found that the new leader jobmanager returns the below error for the savepoint trigger request –

{"errors":["Operation not found under key:"]}


Does Flink support savepoint process recovery in Jobmanager HA setup?

If yes, can you please suggest how to find the savepoint request?


Appreciate your time and help.


~ Abhinav Bajaj

Reply | Threaded
Open this post in threaded view

Re: Savepoint process recovery in Jobmanager HA setup

Yun Tang
Hi Abhinav

If the leader jobmanager fails during savepoint, that savepoint would fail and new jobmanager would then restore from previous jobgraph with latest completed checkpoint in the high-availability storage. That's why new jobmanager could not know anything with regard to previous savepoint.

Yun Tang

From: Bajaj, Abhinav <[hidden email]>
Sent: Saturday, July 27, 2019 7:25
To: [hidden email] <[hidden email]>
Subject: Savepoint process recovery in Jobmanager HA setup



I am trying to test a scenario that triggers a savepoint on a Flink 1.7.1 Job deployed with jobmanager HA mode. 

The purpose is to check if savepoint process recovers if the leader jobmanager fails during the savepoint.


During my testing, I found that the new leader jobmanager returns the below error for the savepoint trigger request –

{"errors":["Operation not found under key:"]}


Does Flink support savepoint process recovery in Jobmanager HA setup?

If yes, can you please suggest how to find the savepoint request?


Appreciate your time and help.


~ Abhinav Bajaj

Reply | Threaded
Open this post in threaded view

Re: Savepoint process recovery in Jobmanager HA setup

Bajaj, Abhinav

Thanks much for your response.

I was also suspecting the same and just wanted to confirm.


I guess the best way forward for now is to request savepoint again.


~ Abhi


From: Yun Tang <[hidden email]>
Date: Saturday, July 27, 2019 at 7:35 AM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: Savepoint process recovery in Jobmanager HA setup


Hi Abhinav


If the leader jobmanager fails during savepoint, that savepoint would fail and new jobmanager would then restore from previous jobgraph with latest completed checkpoint in the high-availability storage. That's why new jobmanager could not know anything with regard to previous savepoint.




Yun Tang

From: Bajaj, Abhinav <[hidden email]>
Sent: Saturday, July 27, 2019 7:25
To: [hidden email] <[hidden email]>
Subject: Savepoint process recovery in Jobmanager HA setup




I am trying to test a scenario that triggers a savepoint on a Flink 1.7.1 Job deployed with jobmanager HA mode. 

The purpose is to check if savepoint process recovers if the leader jobmanager fails during the savepoint.


During my testing, I found that the new leader jobmanager returns the below error for the savepoint trigger request –

{"errors":["Operation not found under key:"]}


Does Flink support savepoint process recovery in Jobmanager HA setup?

If yes, can you please suggest how to find the savepoint request?


Appreciate your time and help.


~ Abhinav Bajaj