(DEPRECATED) Apache Flink User Mailing List archive.

Per-job mode job restart and HA configuration

Classic

List

Threaded

3 messages Options

V N, Suchithra (Nokia - IN/Bangalore)

Per-job mode job restart and HA configuration

Hello,

I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.

Thanks,

Suchithra

r_khachatryan

Re: Per-job mode job restart and HA configuration

Hi Suchithra,

Yes, you need to pass these parameters to standalone-job.sh in Kubernetes job definition.

I'm pulling in Patrick as he might know this subject better.

Regards,
Roman

On Mon, Aug 3, 2020 at 12:24 PM V N, Suchithra (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hello,

I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.

Thanks,

Suchithra

Yang Wang

Re: Per-job mode job restart and HA configuration

Hi Suchithra,

Roman is right. You still need zookeeper HA configured so that the job could recover successfully when jobmanager failover.

Although job jar is bundled in the image, the checkpoint counter and path need to be stored in zookpeeper. When the jobmanager

terminated exceptionally and relaunched by K8s, we need to recover from the latest checkpoint automatically.

Another reason is for leader election and retrieval. For some corner cases, for example, kubelet is crashed, two jobmanager may be

running even the replica of deployment is 1. We need zookeeper for the leader election and leader retrieval so that the taskmanager

could find the active jobmanager.

A native K8s HA is requested in FLINK-12884[1], i will try to push it implemented in next major release(1.12). After that, the HA configuration

on K8s will be more convenient.

[1]. https://issues.apache.org/jira/browse/FLINK-12884

Best,

Yang

Khachatryan Roman <[hidden email]> 于2020年8月3日周一下午10:03写道：

Hi Suchithra,

Yes, you need to pass these parameters to standalone-job.sh in Kubernetes job definition.

I'm pulling in Patrick as he might know this subject better.

Regards,
Roman

On Mon, Aug 3, 2020 at 12:24 PM V N, Suchithra (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hello,

I am using Flink version 1.10.1 in Kubernetes environment. In per-Job mode of flink, to achieve HA do we need zookeeper and HA parameters to restart the job? I am suspicious because job jar is part of the docker itself.

Thanks,

Suchithra