Re: Containers are not released after job failed
Posted by
Till Rohrmann on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Containers-are-not-released-after-job-failed-tp27549p27558.html
Hi,
have you tried whether the same problem also occurs with the latest Flink version (1.8.0, 1.7.2 or 1.6.4)?
If yes, then I would need to take a look at the logs to better understand what's happening.
Cheers,
Till
Hi,
I will loop in Till here who might know
about this problem. In the meantime could you maybe tell us a bit
more about your setup/deployment (how is yarn configured and the
Flink job submitted?) and link to the full logs?
Thanks,
Timo
Am 26.04.19 um 11:15 schrieb 刘建刚:
I run flink 1.6.2 on yarn. At some time,
job is failed becuase of:
org.apache.flink.util.FlinkException: The assigned slot
container_e708_1555051789618_2644286_01_000061_0 was removed
Then the job restarts. After some time, the
container container_e708_1555051789618_2644286_01_000061
is still not released.
The log of
container_e708_1555051789618_2644286_01_000061 is as
following:
The log shows that two tasks are canceled before
successful registration at resource manager and one is
canceled after registration. After five minutes, the
container registers again. At last, the container is alive
but not used.
Anyone have any idea about this problem. Thank
you.