Java heap size in YARN

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Java heap size in YARN

Pawel Bartoszek
Hi,

I have a question regarding configuration of task manager heap size when running YARN session on EMR.

I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use as much as possible of that memory for the task manager heap.

However when requesting 56000 MB when staring YARN actually only around 42GB is assigned to TM. Do you know how I can increase that?


This is how I start YARN session:
/usr/lib/flink/bin/yarn-session.sh --container 2 --taskManagerMemory 56000 --slots 16 --detached -Dparallelism.default=32 -Dtaskmanager.network.numberOfBuffers=20480 ...


This is the output of ps aux on TM box

yarn      42843 1030 67.7 46394740 44688084 ?   Sl   15:27 175:56 /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....

yarn      42837  0.0  0.0 113104  2684 ?        Ss   15:27   0:00 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....


I would expect around 56GB set as max heap size for TM.

some settings from yarn-site.xml that might be of interest:

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>5</value>
  </property>


Cheers,
Pawel
  

Reply | Threaded
Open this post in threaded view
|

Re: Java heap size in YARN

Pawel Bartoszek
I tried also setting  taskmanager.memory.off-heap to true

I still get around 42GB (Heap + DirectMemory)

yarn      56827  837 16.6 16495964 10953748 ?   Sl   16:53  34:10 /usr/lib/jvm/java-openjdk/bin/java -Xms12409m -Xmx12409m -XX:MaxDirectMemorySize=29591m

Cheers,
Pawel


On 15 February 2018 at 16:03, Pawel Bartoszek <[hidden email]> wrote:
Hi,

I have a question regarding configuration of task manager heap size when running YARN session on EMR.

I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use as much as possible of that memory for the task manager heap.

However when requesting 56000 MB when staring YARN actually only around 42GB is assigned to TM. Do you know how I can increase that?


This is how I start YARN session:
/usr/lib/flink/bin/yarn-session.sh --container 2 --taskManagerMemory 56000 --slots 16 --detached -Dparallelism.default=32 -Dtaskmanager.network.numberOfBuffers=20480 ...


This is the output of ps aux on TM box

yarn      42843 1030 67.7 46394740 44688084 ?   Sl   15:27 175:56 /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....

yarn      42837  0.0  0.0 113104  2684 ?        Ss   15:27   0:00 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....


I would expect around 56GB set as max heap size for TM.

some settings from yarn-site.xml that might be of interest:

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>5</value>
  </property>


Cheers,
Pawel
  


Reply | Threaded
Open this post in threaded view
|

Re: Java heap size in YARN

Kien Truong

Hi,

The relevant settings is:

containerized.heap-cutoff-ratio: (Default 0.25) Percentage of heap space to remove from containers started by YARN. When a user requests a certain amount of memory for each TaskManager container (for example 4 GB), we can not pass this amount as the maximum heap space for the JVM (-Xmx argument) because the JVM is also allocating memory outside the heap. YARN is very strict with killing containers which are using more memory than requested. Therefore, we remove this fraction of the memory from the requested heap as a safety margin and add it to the memory used off-heap.


You can reduce this in order to get a bigger JVM heap size or increase it in order to reserve more memory for off-heap usage (for jobs with large rocksdb state),

but I suggest you not changing this setting without careful consideration.


Regards,

Kien


On 2/16/2018 12:01 AM, Pawel Bartoszek wrote:
I tried also setting  taskmanager.memory.off-heap to true

I still get around 42GB (Heap + DirectMemory)

yarn      56827  837 16.6 16495964 10953748 ?   Sl   16:53  34:10 /usr/lib/jvm/java-openjdk/bin/java -Xms12409m -Xmx12409m -XX:MaxDirectMemorySize=29591m

Cheers,
Pawel


On 15 February 2018 at 16:03, Pawel Bartoszek <[hidden email]> wrote:
Hi,

I have a question regarding configuration of task manager heap size when running YARN session on EMR.

I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use as much as possible of that memory for the task manager heap.

However when requesting 56000 MB when staring YARN actually only around 42GB is assigned to TM. Do you know how I can increase that?


This is how I start YARN session:
/usr/lib/flink/bin/yarn-session.sh --container 2 --taskManagerMemory 56000 --slots 16 --detached -Dparallelism.default=32 -Dtaskmanager.network.numberOfBuffers=20480 ...


This is the output of ps aux on TM box

yarn      42843 1030 67.7 46394740 44688084 ?   Sl   15:27 175:56 /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....

yarn      42837  0.0  0.0 113104  2684 ?        Ss   15:27   0:00 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....


I would expect around 56GB set as max heap size for TM.

some settings from yarn-site.xml that might be of interest:

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>5</value>
  </property>


Cheers,
Pawel
  


Reply | Threaded
Open this post in threaded view
|

Re: Java heap size in YARN

Pawel Bartoszek
Thanks Kien. I will at least play with the setting :) We use hadoop (s3) as a chekpoint store. In our case off heap memory is around 300MB as reported on task manager statistic page.

15 lut 2018 17:24 "Kien Truong" <[hidden email]> napisał(a):

Hi,

The relevant settings is:

containerized.heap-cutoff-ratio: (Default 0.25) Percentage of heap space to remove from containers started by YARN. When a user requests a certain amount of memory for each TaskManager container (for example 4 GB), we can not pass this amount as the maximum heap space for the JVM (-Xmx argument) because the JVM is also allocating memory outside the heap. YARN is very strict with killing containers which are using more memory than requested. Therefore, we remove this fraction of the memory from the requested heap as a safety margin and add it to the memory used off-heap.


You can reduce this in order to get a bigger JVM heap size or increase it in order to reserve more memory for off-heap usage (for jobs with large rocksdb state),

but I suggest you not changing this setting without careful consideration.


Regards,

Kien


On 2/16/2018 12:01 AM, Pawel Bartoszek wrote:
I tried also setting  taskmanager.memory.off-heap to true

I still get around 42GB (Heap + DirectMemory)

yarn      56827  837 16.6 16495964 10953748 ?   Sl   16:53  34:10 /usr/lib/jvm/java-openjdk/bin/java -Xms12409m -Xmx12409m -XX:MaxDirectMemorySize=29591m

Cheers,
Pawel


On 15 February 2018 at 16:03, Pawel Bartoszek <[hidden email]> wrote:
Hi,

I have a question regarding configuration of task manager heap size when running YARN session on EMR.

I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use as much as possible of that memory for the task manager heap.

However when requesting 56000 MB when staring YARN actually only around 42GB is assigned to TM. Do you know how I can increase that?


This is how I start YARN session:
/usr/lib/flink/bin/yarn-session.sh --container 2 --taskManagerMemory 56000 --slots 16 --detached -Dparallelism.default=32 -Dtaskmanager.network.numberOfBuffers=20480 ...


This is the output of ps aux on TM box

yarn      42843 1030 67.7 46394740 44688084 ?   Sl   15:27 175:56 /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....

yarn      42837  0.0  0.0 113104  2684 ?        Ss   15:27   0:00 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....


I would expect around 56GB set as max heap size for TM.

some settings from yarn-site.xml that might be of interest:

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>5</value>
  </property>


Cheers,
Pawel
  


Reply | Threaded
Open this post in threaded view
|

Re: Java heap size in YARN

Kien Truong
The off heap usage reported in the task manager ui can be misleading, because it does not contain the memory used by native library like rocksdb, which can be huge if you have large stateful job.

Regards,
Kien

Sent from TypeApp
On Feb 16, 2018, at 00:33, Pawel Bartoszek <[hidden email]> wrote:
Thanks Kien. I will at least play with the setting :) We use hadoop (s3) as a chekpoint store. In our case off heap memory is around 300MB as reported on task manager statistic page.

15 lut 2018 17:24 "Kien Truong" <[hidden email]> napisał(a):

Hi,

The relevant settings is:

containerized.heap-cutoff-ratio: (Default 0.25) Percentage of heap space to remove from containers started by YARN. When a user requests a certain amount of memory for each TaskManager container (for example 4 GB), we can not pass this amount as the maximum heap space for the JVM (-Xmx argument) because the JVM is also allocating memory outside the heap. YARN is very strict with killing containers which are using more memory than requested. Therefore, we remove this fraction of the memory from the requested heap as a safety margin and add it to the memory used off-heap.


You can reduce this in order to get a bigger JVM heap size or increase it in order to reserve more memory for off-heap usage (for jobs with large rocksdb state),

but I suggest you not changing this setting without careful consideration.


Regards,

Kien


On 2/16/2018 12:01 AM, Pawel Bartoszek wrote:
I tried also setting  taskmanager.memory.off-heap to true

I still get around 42GB (Heap + DirectMemory)

yarn      56827  837 16.6 16495964 10953748 ?   Sl   16:53  34:10 /usr/lib/jvm/java-openjdk/bin/java -Xms12409m -Xmx12409m -XX:MaxDirectMemorySize=29591m

Cheers,
Pawel


On 15 February 2018 at 16:03, Pawel Bartoszek <[hidden email]> wrote:
Hi,

I have a question regarding configuration of task manager heap size when running YARN session on EMR.

I am running 2 task managers on m4.4xlarge (64GB RAM). I would like to use as much as possible of that memory for the task manager heap.

However when requesting 56000 MB when staring YARN actually only around 42GB is assigned to TM. Do you know how I can increase that?


This is how I start YARN session:
/usr/lib/flink/bin/yarn-session.sh --container 2 --taskManagerMemory 56000 --slots 16 --detached -Dparallelism.default=32 -Dtaskmanager.network.numberOfBuffers=20480 ...


This is the output of ps aux on TM box

yarn      42843 1030 67.7 46394740 44688084 ?   Sl   15:27 175:56 /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....

yarn      42837  0.0  0.0 113104  2684 ?        Ss   15:27   0:00 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Xms42000m -Xmx42000m ....


I would expect around 56GB set as max heap size for TM.

some settings from yarn-site.xml that might be of interest:

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>57344</value>
  </property>

<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>5</value>
  </property>


Cheers,
Pawel