Re: ZooKeeper connection SUSPENDING

Posted by r_khachatryan on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/ZooKeeper-connection-SUSPENDING-tp38779p38798.html

Hi,

AFAIK, the features discussed in the threads you mentioned are not yet implemented. So there is no way to avoid Job restarts in case of ZK rolling restarts.
I'm pulling in Till as he might know better.

Regards,
Roman


On Fri, Oct 16, 2020 at 7:45 PM Kenzyme <[hidden email]> wrote:
Hi,

Related to https://mail-archives.apache.org/mod_mbox/flink-dev/201709.mbox/%3CCA+faj9yvPyzmmLoEWAMPgXDP6kx+0oed1Z5k4s3K9sgiCFyb=w@...%3E and https://issues.apache.org/jira/browse/FLINK-10052, I was wondering if there's a way to prevent Flink instances from failing while doing a rolling restart on ZK followers while still keeping the quorum?

This is what was shown in Flink logs while restarting ZK :
ZooKeeper connection SUSPENDING. Changes to the submitted job graphs are not monitored (temporarily).

I was able to reproduce this twice with a quorum of 5 ZK nodes while doing some ZK maintenance.

Thanks!

Kenzyme Le