Hi Suchithra,
Roman is right. You still need zookeeper HA configured so that the job could recover successfully when jobmanager failover.
Although job jar is bundled in the image, the checkpoint counter and path need to be stored in zookpeeper. When the jobmanager
terminated exceptionally and relaunched by K8s, we need to recover from the latest checkpoint automatically.
Another reason is for leader election and retrieval. For some corner cases, for example, kubelet is crashed, two jobmanager may be
running even the replica of deployment is 1. We need zookeeper for the leader election and leader retrieval so that the taskmanager
could find the active jobmanager.
A native K8s HA is requested in FLINK-12884[1], i will try to push it implemented in next major release(1.12). After that, the HA configuration
on K8s will be more convenient.
Best,
Yang
Hi Suchithra,
Yes, you need to pass these parameters to standalone-job.sh in Kubernetes job definition.
I'm pulling in Patrick as he might know this subject better.
On Mon, Aug 3, 2020 at 12:24 PM V N, Suchithra (Nokia - IN/Bangalore) <
[hidden email]> wrote:
Hello,
I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.
Thanks,
Suchithra