Hi, Related to https://mail-archives.apache.org/mod_mbox/flink-dev/201709.mbox/%3CCA+faj9yvPyzmmLoEWAMPgXDP6kx+0oed1Z5k4s3K9sgiCFyb=w@...%3E and https://issues.apache.org/jira/browse/FLINK-10052, I was wondering if there's a way to prevent Flink instances from failing while doing a rolling restart on ZK followers while still keeping the quorum? This is what was shown in Flink logs while restarting ZK : ZooKeeper connection SUSPENDING. Changes to the submitted job graphs are not monitored (temporarily). I was able to reproduce this twice with a quorum of 5 ZK nodes while doing some ZK maintenance. Thanks! Kenzyme Le |
Hi, AFAIK, the features discussed in the threads you mentioned are not yet implemented. So there is no way to avoid Job restarts in case of ZK rolling restarts. I'm pulling in Till as he might know better. Regards,
Roman On Fri, Oct 16, 2020 at 7:45 PM Kenzyme <[hidden email]> wrote:
|
Hi Roman, Thank you for your reply. I'm not 100% sure if those features discussed in the threads will fix the issue, but they seemed related in some way. Basically, the expected behaviour I had for Flink was similar to how Kafka works i.e. Kafka services continues w/o disruption whenever ZK quorum is maintained during rolling updates. Best, Kenzyme Le
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, October 19th, 2020 at 4:38 PM, Khachatryan Roman <[hidden email]> wrote:
|
Hi Kenzyme, at the moment Flink will stop the execution of jobs when it loses its connection to ZooKeeper for whatever reason. If a ZK rolling update can cause the connection loss to the quorum, then it's what you are seeing. FLINK-10052 wants to add a feature which allows Flink to tolerate a SUSPENDED ZK connection for a short amount of time. I haven't tried it out with a rolling ZK update but it might solve the problem you are observing. Cheers, Till On Tue, Oct 20, 2020 at 5:41 AM Kenzyme <[hidden email]> wrote:
|
Hi Till,
Thank you for the explanation. Looking forward for this feature to be added in the near future. Best, Kenzyme -------- Original Message -------- On Oct. 20, 2020, 8:46 a.m., Till Rohrmann < [hidden email]> wrote:
|
Free forum by Nabble | Edit this page |