Flink checkpointing state

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink checkpointing state

Boris Lublinsky
This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one

Reply | Threaded
Open this post in threaded view
|

Re: Flink checkpointing state

Yun Tang
Hi Boris

Please refer to FLINK-12884[1] for current progress of native HA support of k8s which targets for release-1.12.


Best
Yun Tang


From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 2:56
To: user <[hidden email]>
Subject: Flink checkpointing state
 
This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one

Reply | Threaded
Open this post in threaded view
|

Re: Flink checkpointing state

Boris Lublinsky
Thanks Yun,
Flip contains 2 parts - leader election and HA information persistence and offers two options.
Can you tell us what exactly will be part of 1.12. 
We would be happy with second option for now, if its faster to implement.
 

On Oct 27, 2020, at 1:11 AM, Yun Tang <[hidden email]> wrote:

Hi Boris

Please refer to FLINK-12884[1] for current progress of native HA support of k8s which targets for release-1.12.


Best
Yun Tang


From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 2:56
To: user <[hidden email]>
Subject: Flink checkpointing state
 
This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one

Reply | Threaded
Open this post in threaded view
|

Re: Flink checkpointing state

Yun Tang
Hi

Added Yang Wang who mainly develops this feature, I think he could provide more information.

Best
Yun Tang

From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 22:57
To: Yun Tang <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Flink checkpointing state
 
Thanks Yun,
Flip contains 2 parts - leader election and HA information persistence and offers two options.
Can you tell us what exactly will be part of 1.12. 
We would be happy with second option for now, if its faster to implement.
 

On Oct 27, 2020, at 1:11 AM, Yun Tang <[hidden email]> wrote:

Hi Boris

Please refer to FLINK-12884[1] for current progress of native HA support of k8s which targets for release-1.12.


Best
Yun Tang


From: Boris Lublinsky <[hidden email]>
Sent: Tuesday, October 27, 2020 2:56
To: user <[hidden email]>
Subject: Flink checkpointing state
 
This is from Flink 1.8:

"Job Manager keeps some state related to checkpointing in it’s memory. This state would be lost on Job Manager crashes, which is why this state is persisted in ZooKeeper. This means that even though there is no real need for the leader election and -discovery part of Flink’s HA mode (as is this handled natively by Kubernetes), it still needs to be enabled just for storing the checkpoint state.”

Was it ever fixed in Flink 1.10 or 1.11? If running Flink on K8, without HA, there is no Zookeeper. And if the above is still the case, then checkpointing will never pick up the right one