Job manager high availability job restarting

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Job manager high availability job restarting

Giselle van Dongen

Hi!


We are running a high available Flink cluster in standalone mode with Zookeeper with 2 jobmanagers and 5 taskmanagers.

When the jobmanager is killed, the standby jobmanager takes over. But the job is also restarted.

Is this the default behavior and can we avoid job restarts (for jobmanager failure) in some way?


Thank you,

Giselle

Reply | Threaded
Open this post in threaded view
|

Re: Job manager high availability job restarting

Yang Wang
I think it is the expected behavior. When the active JobManager loses leadership, the standby one
will try to take over and recover the job from the latest successful checkpoint.

The high availability just helps with leader election/retrieval and HA meta storage(e.g. job graphs, checkpoints, etc.).
It could not avoid job restarts in JobManager failures.

Best,
Yang

Giselle van Dongen <[hidden email]> 于2021年1月6日周三 上午6:23写道:

Hi!


We are running a high available Flink cluster in standalone mode with Zookeeper with 2 jobmanagers and 5 taskmanagers.

When the jobmanager is killed, the standby jobmanager takes over. But the job is also restarted.

Is this the default behavior and can we avoid job restarts (for jobmanager failure) in some way?


Thank you,

Giselle