Hi,
Yesterday morning I got below error in Zookeeper. After this error, my Flink did not connect to ZK and jobs went to hang state. I had to cancel and redeploy my all jobs to bring it to normal state.
2020-02-28 02:45:56,811 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1701028573403f3, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
At the same time I saw below error in Flink.
2020-02-28 02:46:49,095 ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (zk-cs:2181) and timeout (3000) / elapsed (14305)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Has anyone face similar error earlier.
My environment is
Azure Kubernetes 1.15.7
Flink 1.6.0
Zookeeper 3.4.10
Warm Regards,
Samir Chauhan
Free forum by Nabble | Edit this page |