(DEPRECATED) Apache Flink User Mailing List archive.

Flink checkpointing state

Classic

List

Threaded

4 messages Options

Boris Lublinsky

Flink checkpointing state

This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one

Yun Tang

Re: Flink checkpointing state

Hi Boris

Please refer to FLINK-12884[1] for current progress of native HA support of k8s which targets for release-1.12.

[1] https://issues.apache.org/jira/browse/FLINK-12884

Best

Yun Tang

From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 2:56
To: user <[hidden email]>
Subject: Flink checkpointing state

This is from Flink 1.8:

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one

Boris Lublinsky

Re: Flink checkpointing state

Thanks Yun,

This refers to Flip144 https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink

Flip contains 2 parts - leader election and HA information persistence and offers two options.

Can you tell us what exactly will be part of 1.12.

We would be happy with second option for now, if its faster to implement.

On Oct 27, 2020, at 1:11 AM, Yun Tang <[hidden email]> wrote:

Hi Boris

Please refer to FLINK-12884[1] for current progress of native HA support of k8s which targets for release-1.12.

[1] https://issues.apache.org/jira/browse/FLINK-12884

Best
Yun Tang

From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 2:56
To: user <[hidden email]>
Subject: Flink checkpointing state

This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one

Yun Tang

Re: Flink checkpointing state

Added Yang Wang who mainly develops this feature, I think he could provide more information.

Best

Yun Tang

From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 22:57
To: Yun Tang <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Flink checkpointing state

Thanks Yun,

This refers to Flip144 https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink

Flip contains 2 parts - leader election and HA information persistence and offers two options.

Can you tell us what exactly will be part of 1.12.

We would be happy with second option for now, if its faster to implement.

On Oct 27, 2020, at 1:11 AM, Yun Tang <[hidden email]> wrote:

Hi Boris

Please refer to FLINK-12884[1] for current progress of native HA support of k8s which targets for release-1.12.

[1] https://issues.apache.org/jira/browse/FLINK-12884

Best

Yun Tang

From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 2:56
To: user <[hidden email]>
Subject: Flink checkpointing state

This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one