It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.When running my job like so:$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....The job completes without any problems. When running it like so:$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....(note the one more M of memory for the JM), the execution stalls, continuously reporting:.....TaskManager status (6/7)TaskManager status (6/7)TaskManager status (6/7).....I did some poking around, but I couldn't find any direct correlation with the code.The JM log says:.....16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M.....but then continues to report.....16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing.....forever until I cancel the job.If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.Robert--My GPG Key ID: 336E2680
Free forum by Nabble | Edit this page |