Re: ZooKeeper connection SUSPENDING

Posted by Kenzyme on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/ZooKeeper-connection-SUSPENDING-tp38779p38804.html

Hi Roman,

Thank you for your reply.

I'm not 100% sure if those features discussed in the threads will fix the issue, but they seemed related in some way.

Basically, the expected behaviour I had for Flink was similar to how Kafka works i.e.  Kafka services continues w/o disruption whenever ZK quorum is maintained during rolling updates.

Best,

Kenzyme Le


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, October 19th, 2020 at 4:38 PM, Khachatryan Roman <[hidden email]> wrote:
Hi,

AFAIK, the features discussed in the threads you mentioned are not yet implemented. So there is no way to avoid Job restarts in case of ZK rolling restarts.
I'm pulling in Till as he might know better.

Regards,
Roman


On Fri, Oct 16, 2020 at 7:45 PM Kenzyme <[hidden email]> wrote:
Hi,

Related to https://mail-archives.apache.org/mod_mbox/flink-dev/201709.mbox/%3CCA+faj9yvPyzmmLoEWAMPgXDP6kx+0oed1Z5k4s3K9sgiCFyb=w@...%3E and https://issues.apache.org/jira/browse/FLINK-10052, I was wondering if there's a way to prevent Flink instances from failing while doing a rolling restart on ZK followers while still keeping the quorum?

This is what was shown in Flink logs while restarting ZK :
ZooKeeper connection SUSPENDING. Changes to the submitted job graphs are not monitored (temporarily).

I was able to reproduce this twice with a quorum of 5 ZK nodes while doing some ZK maintenance.

Thanks!

Kenzyme Le