http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Futures-timed-out-when-trying-to-cancel-a-job-with-savepoint-tp21808.html
Hey guys,
We just built a brand new Flink 1.4.0 cluster with HA and everything seems to be working fine, but we are getting some errors with savepoints.
For example, I have a running job
------------------ Running/Restarting Jobs -------------------
25.07.2018 11:55:18 : e5280bad25a7f19122f98483f94aba26 : Mr Banks (RUNNING)
--------------------------------------------------------------
If I try to create a savepoint with
flink savepoint e5280bad25a7f19122f98483f94aba26
The command just stays there and never returns (I waited about 10 minutes, with no response). Then I tried to cancel with savepoint:
flink cancel e5280bad25a7f19122f98483f94aba26 -s
And I got a
java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds]
I checked the jobmanager logs, but I can't see any problems; I checked the Hadoop logs for any errors (believing the problem may be in the underlying system), but it seems it did create the nodes properly -- at least, there are no errors there too.
Is there anything else I should check?
PS: My state is not that big (my napkin calculations say it's less than 1Gb) so it doesn't seem it's a problem with the state size taking too long to be saved.
--
Julio Biason, Sofware Engineer
AZION | Deliver. Accelerate. Protect.
Office: <a href="callto:+555130838101" value="+555130838101" style="color:rgb(17,85,204);font-family:arial,sans-serif;font-size:12.8px" target="_blank">+55 51 3083 8101 | Mobile: <a href="callto:+5551996209291" style="color:rgb(17,85,204)" target="_blank">+55 51 99907 0554