Per-job mode job restart and HA configuration

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Per-job mode job restart and HA configuration

V N, Suchithra (Nokia - IN/Bangalore)

Hello,

 

I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.

 

Thanks,

Suchithra

Reply | Threaded
Open this post in threaded view
|

Re: Per-job mode job restart and HA configuration

r_khachatryan
Hi Suchithra,

Yes, you need to pass these parameters to standalone-job.sh in Kubernetes job definition.

I'm pulling in Patrick as he might know this subject better.

Regards,
Roman


On Mon, Aug 3, 2020 at 12:24 PM V N, Suchithra (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hello,

 

I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.

 

Thanks,

Suchithra

Reply | Threaded
Open this post in threaded view
|

Re: Per-job mode job restart and HA configuration

Yang Wang
Hi Suchithra,

Roman is right. You still need zookeeper HA configured so that the job could recover successfully when jobmanager failover.
Although job jar is bundled in the image, the checkpoint counter and path need to be stored in zookpeeper. When the jobmanager
terminated exceptionally and relaunched by K8s, we need to recover from the latest checkpoint automatically.

Another reason is for leader election and retrieval. For some corner cases, for example, kubelet is crashed, two jobmanager may be
running even the replica of deployment is 1. We need zookeeper for the leader election and leader retrieval so that the taskmanager
could find the active jobmanager.

A native K8s HA is requested in FLINK-12884[1], i will try to push it implemented in next major release(1.12). After that, the HA configuration
on K8s will be more convenient.




Best,
Yang

Khachatryan Roman <[hidden email]> 于2020年8月3日周一 下午10:03写道:
Hi Suchithra,

Yes, you need to pass these parameters to standalone-job.sh in Kubernetes job definition.

I'm pulling in Patrick as he might know this subject better.

Regards,
Roman


On Mon, Aug 3, 2020 at 12:24 PM V N, Suchithra (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hello,

 

I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.

 

Thanks,

Suchithra