Hi all,I submit a flink job through yarn-cluster mode and cancel job with savepoint option immediately after job status change to deployed. Sometimes i met this error:
org.apache.flink.util.FlinkException: Could not cancel job xxxx. at org.apache.flink.client.cli.CliFrontend.lambda$cancel$4( CliFrontend.java:585) at org.apache.flink.client.cli.CliFrontend.runClusterAction( CliFrontend.java:960) at org.apache.flink.client.cli.CliFrontend.cancel( CliFrontend.java:577) at org.apache.flink.client.cli.CliFrontend.parseParameters( CliFrontend.java:1034) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime. concurrent.FutureUtils$ RetryException: Could not complete the operation. Number of retries has been exhausted. at java.util.concurrent.CompletableFuture.reportGet( CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get( CompletableFuture.java:1895) at org.apache.flink.client.program.rest. RestClusterClient. cancelWithSavepoint( RestClusterClient.java:398) at org.apache.flink.client.cli.CliFrontend.lambda$cancel$4( CliFrontend.java:583) ... 6 moreCaused by: org.apache.flink.runtime.concurrent.FutureUtils$ RetryException: Could not complete the operation. Number of retries has been exhausted. at org.apache.flink.runtime.concurrent.FutureUtils.lambda$ retryOperationWithDelay$5( FutureUtils.java:213) at java.util.concurrent.CompletableFuture. uniWhenComplete( CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$ UniWhenComplete.tryFire( CompletableFuture.java:736) ... 1 moreCaused by: java.util.concurrent.CompletionException: java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx at java.util.concurrent.CompletableFuture. encodeThrowable( CompletableFuture.java:292) at java.util.concurrent.CompletableFuture. completeThrowable( CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniCompose( CompletableFuture.java:943) at java.util.concurrent.CompletableFuture$UniCompose. tryFire(CompletableFuture. java:926) ... 16 moreCaused by: java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxxat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect( SocketChannelImpl.java:717) at org.apache.flink.shaded.netty4.io.netty.channel. socket.nio.NioSocketChannel. doFinishConnect( NioSocketChannel.java:224) at org.apache.flink.shaded.netty4.io.netty.channel.nio. AbstractNioChannel$ AbstractNioUnsafe. finishConnect( AbstractNioChannel.java:281) ... 7 more
I check the jobmanager log, no error found. Savepoint is correct saved in hdfs. Yarn appliction status changed to FINISHED and FinalStatus change to KILLED.I think this issue occur because RestClusterClient cannot find jobmanager addresss after Jobmanager(AM) has shutdown.My flink version is 1.5.3.Anyone could help me to resolve this issue, thanks!
Best Regard!
Free forum by Nabble | Edit this page |