Savepoint process recovery in Jobmanager HA setup

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Savepoint process recovery in Jobmanager HA setup

Bajaj, Abhinav

Hi,

 

I am trying to test a scenario that triggers a savepoint on a Flink 1.7.1 Job deployed with jobmanager HA mode. 

The purpose is to check if savepoint process recovers if the leader jobmanager fails during the savepoint.

 

During my testing, I found that the new leader jobmanager returns the below error for the savepoint trigger request –

{"errors":["Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@e287af3"]}

 

Does Flink support savepoint process recovery in Jobmanager HA setup?

If yes, can you please suggest how to find the savepoint request?

 

Appreciate your time and help.

 

~ Abhinav Bajaj

Reply | Threaded
Open this post in threaded view
|

Re: Savepoint process recovery in Jobmanager HA setup

Yun Tang
Hi Abhinav

If the leader jobmanager fails during savepoint, that savepoint would fail and new jobmanager would then restore from previous jobgraph with latest completed checkpoint in the high-availability storage. That's why new jobmanager could not know anything with regard to previous savepoint.


Best
Yun Tang

From: Bajaj, Abhinav <[hidden email]>
Sent: Saturday, July 27, 2019 7:25
To: [hidden email] <[hidden email]>
Subject: Savepoint process recovery in Jobmanager HA setup
 

Hi,

 

I am trying to test a scenario that triggers a savepoint on a Flink 1.7.1 Job deployed with jobmanager HA mode. 

The purpose is to check if savepoint process recovers if the leader jobmanager fails during the savepoint.

 

During my testing, I found that the new leader jobmanager returns the below error for the savepoint trigger request –

{"errors":["Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@e287af3"]}

 

Does Flink support savepoint process recovery in Jobmanager HA setup?

If yes, can you please suggest how to find the savepoint request?

 

Appreciate your time and help.

 

~ Abhinav Bajaj

Reply | Threaded
Open this post in threaded view
|

Re: Savepoint process recovery in Jobmanager HA setup

Bajaj, Abhinav

Thanks much for your response.

I was also suspecting the same and just wanted to confirm.

 

I guess the best way forward for now is to request savepoint again.

 

~ Abhi

 

From: Yun Tang <[hidden email]>
Date: Saturday, July 27, 2019 at 7:35 AM
To: "Bajaj, Abhinav" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: Savepoint process recovery in Jobmanager HA setup

 

Hi Abhinav

 

If the leader jobmanager fails during savepoint, that savepoint would fail and new jobmanager would then restore from previous jobgraph with latest completed checkpoint in the high-availability storage. That's why new jobmanager could not know anything with regard to previous savepoint.

 

 

Best

Yun Tang


From: Bajaj, Abhinav <[hidden email]>
Sent: Saturday, July 27, 2019 7:25
To: [hidden email] <[hidden email]>
Subject: Savepoint process recovery in Jobmanager HA setup

 

Hi,

 

I am trying to test a scenario that triggers a savepoint on a Flink 1.7.1 Job deployed with jobmanager HA mode. 

The purpose is to check if savepoint process recovers if the leader jobmanager fails during the savepoint.

 

During my testing, I found that the new leader jobmanager returns the below error for the savepoint trigger request –

{"errors":["Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@e287af3"]}

 

Does Flink support savepoint process recovery in Jobmanager HA setup?

If yes, can you please suggest how to find the savepoint request?

 

Appreciate your time and help.

 

~ Abhinav Bajaj