Re: 1.1.4 on YARN - vcores change?

Posted by rmetzger0 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/1-1-4-on-YARN-vcores-change-tp11016p11019.html

Hi Shannon,

Flink is reading the number of available vcores from the local YARN configuration. Is it possible that the YARN / Hadoop config on the machine where you are submitting your job from sets the number of vcores as 4 ?


On Fri, Jan 13, 2017 at 12:51 AM, Shannon Carey <[hidden email]> wrote:
Did anything change in 1.1.4 with regard to YARN & vcores?

I'm getting this error when deploying 1.1.4 to my test cluster. Only the Flink version changed.
 [0mjava.lang.RuntimeException: Couldn't deploy Yarn cluster
 [0m	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:384)
 [0m	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:591)
 [0m	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:465)
 [0mCaused by: org.apache.flink.configuration.IllegalConfigurationException: The number of virtual cores per node were configured with 8 but Yarn only has 4 virtual cores available. Please note that the number of virtual cores is set to the number of task slots by default unless configured in the Flink config with 'yarn.containers.vcores.'
 [0m	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:273)
 [0m	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:393)
 [0m	at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:381)
 [0m	... 2 more

When I run: ./bin/yarn-session.sh –q
It shows 8 vCores on each machine:

NodeManagers in the ClusterClient 3|Property         |Value          

+---------------------------------------+

|NodeID           |ip-10-2-…:8041 

|Memory           |12288 MB         

|vCores           |8                

|HealthReport     |                 

|Containers       |0                

+---------------------------------------+

|NodeID           |ip-10-2-…:8041 

|Memory           |12288 MB         

|vCores           |8                

|HealthReport     |                 

|Containers       |0                

+---------------------------------------+

|NodeID           |ip-10-2-…:8041 

|Memory           |12288 MB         

|vCores           |8                

|HealthReport     |                 

|Containers       |0                

+---------------------------------------+

Summary: totalMemory 36864 totalCores 24

Queue: default, Current Capacity: 0.0 Max Capacity: 1.0 Applications: 0


I'm running:
./bin/yarn-session.sh –n 3 --jobManagerMemory 1504 --taskManagerMemory 10764 --slots 8 —detached

I have not specified any value for "yarn.containers.vcores" in my config.

I switched to –n 5 and —slots 4, and halved the taskManagerMemory, which allowed the cluster to start.

However, in the YARN "Nodes" UI I see "VCores Used: 2" and "VCores Avail: 6" on all three nodes. And if I look at one of the Containers, it says, "Resource: 5408 Memory, 1 VCores". I don't understand what's happening here.

Thanks…