I think it is the expected behavior. When the active JobManager loses leadership, the standby one
will try to take over and recover the job from the latest successful checkpoint.
The high availability just helps with leader election/retrieval and HA meta storage(e.g. job graphs, checkpoints, etc.).
It could not avoid job restarts in JobManager failures.
Best,
Yang
Hi!
We are running a high available Flink cluster in standalone mode with Zookeeper with 2 jobmanagers and 5 taskmanagers.
When the jobmanager is killed, the standby jobmanager takes over. But the job is also restarted.
Is this the default behavior and can we avoid job restarts (for jobmanager failure) in some way?
Thank you,
Giselle