Re: All but one TMs connect when JM has more than 16G of memory
Posted by
rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/All-but-one-TMs-connect-when-JM-has-more-than-16G-of-memory-tp2974p2976.html
Hi Robert,
the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.
Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.