Re: TimeoutException in Flink 1.11 stop command
Posted by
Diwakar Jha on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/TimeoutException-in-Flink-1-11-stop-command-tp43628p43679.html
Thanks.
I tried this command and it worked.
flink stop -p s3a://path_to_savepoint/savepoints 5f9241d336ea2c652a84f79ac3158597 -yid application_1620673166934_0001
I will look at the "client.timeout" also to figure out what actually happened.
Thanks.
On Tue, May 11, 2021 at 3:04 AM Chesnay Schepler <
[hidden email]> wrote:
Essentially this exception just means
that the savepoint operation took longer than the CLI expected.
This can occur for a number of reasons;
maybe everything is working as expected but the timeout is just
too low (controlled via "client.timeout").
It could also be that the savepoint
operation takes abnormally long; for example due to IO
bottlenecks.
I suggest to look into the JobManager
logs to see whether the savepoint was actually created / the
application shut down, and if so then maybe just increase the
timeouts.
On 5/11/2021 9:06 AM, Diwakar Jha
wrote:
Hello,
I'm trying to use the flink 1.11 stop command to gracefully
shutdown application with savepoint.
flink
stop --savepointPath s3a://path_to_save_point
c5d52e0146258f80fd52a3bf002d2a1b -yid
application_1620673166934_0001
2021-05-11
06:26:57,852 ERROR org.apache.flink.client.cli.CliFrontend []
- Error while running the command.
org.apache.flink.util.FlinkException: Could not stop with a
savepoint job "c5d52e0146258f80fd52a3bf002d2a1b".
at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
at
org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
at java.security.AccessController.doPrivileged(Native Method)
~[?:1.8.0_252]
at javax.security.auth.Subject.doAs(Subject.java:422)
[?:1.8.0_252]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
[hadoop-common-3.2.1-amzn-1.jar:?]
at
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
[flink-dist_2.12-1.11.0.jar:1.11.0]
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
[flink-dist_2.12-1.11.0.jar:1.11.0]
Caused by: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
~[?:1.8.0_252]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
~[?:1.8.0_252]
at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
... 9 more
Cancel command seems to be working fine.
Please let me know how to fix this TimeoutException.
Thanks.