I'm testing per-job cluster on YARN.
I just need to launch 7 TMs each with 50GB memory (total 350GB) but Flink makes more resource request to YARN than necessary. All of the remaining memory in YARN, around 370GB, are reserved by the Flink job, which I can check in YARN UI. The remaining memory is not used but reserved; that’s very weird. Attached is JM log. Any help would be greatly appreciated! Thanks, - Dongwon jmlog.txt (246K) Download Attachment |
Hi Dongwon, I see that you are using the latest master (Flink 1.6-SNAPSHOT). I couldn't find a JIRA issue to point you to. Till (in CC) should know more details about this problem. Best, Fabian 2018-05-05 12:50 GMT+02:00 Dongwon Kim <[hidden email]>: I'm testing per-job cluster on YARN. |
Hi Fabian and till, Below is what I've observed today. Hope it provides a strong evidence to figure out the problem. I attach another log file, jmlog2.txt, after observing the different behavior of a per-job cluster with more memory given to YARN nodemanagers (compared to jmlog.txt). - jmlog.txt : Each of 7 NodeManagers has 96GB. Only a single TM (50GB) can be scheduled on a NM and I ended up with having only 7 NodeManagers. There's no room for extra unnecessary TaskManagers. - jmlog2.txt : Each of 7 NMs has 128GB. After scheduling a TM on each NM, RM can schedule additional 7 TMs as each NM has remaining 78 GB. What I see from both log files is that, - ExecutionGraph creates 100 tasks as I specified. - Initially 7 necessary containers (for 7 TMs each with 16 slots) are requested to YARN, which is quite desired behavior. - However, extra unnecessary 93 requests are made after the very first TaskManager is registered to SlotManager with the following messages: + jmlog.txt : Register TaskManager 640b098f3a132b452a74673631a0bf + jmlog2.txt : Registering TaskManager container_1525676778566_0001_ (Please note that the info messages are different in jmlog.txt and jmlog2.txt; it is due to a recent hotfix "Add resourceId to TaskManager registration messages") The 93 containers should not be asked as JobMaster is going to have enough slots on the 6 TaskManagers which will be soon registered to SlotManager. This causes a deadlock situation if YARN does not have resources to allocate such 93 containers as in jmlog.txt. Unlike in jmlog.txt, jmlog2.txt shows - Extra TMs are scheduled on newly scheduled containers. - Extra TMs are not given any tasks for while. - Extra TMs are shut down with the below message. "Closing TaskExecutor connection container_1525676778566_0001_ - At the end, there are no pending container requests in jmlog2.txt at the end. p.s. I just found that SlotManager is only for flip-6. Nevertheless, I write this email to user@ as I originally start this thread on user@. Sorry for the inconvenience. - Dongwon On Mon, May 7, 2018 at 9:27 PM, Fabian Hueske <[hidden email]> wrote:
|
Hi Dongwon, Fabian is right with his analysis of the problem. Currently, the new ResourceManager implementation will start a new TM per requested slot independent of the number of configured slots. This behavior can cause that the cluster requests too many resources when a job is started. However, the superfluous resources will be released once the TMs are idling for too long. This is a limitation which the community will most likely address with the next release. At the moment, it is recommended to set the number of slots per TM to 1. That way the system won't allocate too many TMs. Of course, you should then also adapt the TM memory accordingly. Cheers, Till On Mon, May 7, 2018 at 4:08 PM, Dongwon Kim <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |