(DEPRECATED) Apache Flink User Mailing List archive.

Yarn configuration

Classic

List

Threaded

9 messages Options

Michele Bertoni

Yarn configuration

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

rmetzger0

Re: Yarn configuration

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

Michele Bertoni

Re: Yarn configuration

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

Michele Bertoni

Re: Yarn configuration

I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

thanks

Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <[hidden email]> ha scritto:

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

Fabian Hueske-2

Re: Yarn configuration

Hi Michele,

the 10506 MB refer to the size of Flink's managed memory whereas the 20992 MB refer to the total amount of TM memory. At start-up, the TM allocates a fraction of the JVM memory as byte arrays and manages this portion by itself. The remaining memory is used as regular JVM heap for TM and user code.

The purpose of the warning is to tell the user, that the memory configuration might not be optimal. However, this depends of course on the setup environment and should probably be rephrased to make this more clear.

Cheers, Fabian

2015-07-27 11:07 GMT+02:00 Michele Bertoni <[hidden email]>:

I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

thanks

Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <[hidden email]> ha scritto:

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

Michele Bertoni

Re: Yarn configuration

Hi Fabian, thanks for your reply

so you flink is using about 50% of memory for itself right?

anyway now I am running an EMR with 1 master and 5 core all of them are m3.2xlarge with 8 cores and 30GB of memory

I would like to run flink on yarn with 40 slots on 5 tm with the maximum available resources, what i do is

change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40

run yarn with the command

./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed out of 30GB I don’t know why)

I get an error something like "failed allocating memory after 4/5 container available memory 20992"

I suspect that it is not using the master of the cluster for allocating the jm but using one of the core right? in fact 20992 is exactly 23040-2048

then i run it with 20992

./yarn-session.sh -n 5 -jm 2048 -tm 20992

it succeeds in running 5tm with 40 slots, but when I run a program I always get

Caused by: java.io.IOException: Insufficient number of network buffers: required 40, but only 14 available. The total number of network buffers is currently set to 4096. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers’.

I change the buffers number as robert said from 2048 to 4096 on of my programs run but the second still has same problems

Thanks for help

Best,

michele

Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <[hidden email]> ha scritto:

Hi Michele,

the 10506 MB refer to the size of Flink's managed memory whereas the 20992 MB refer to the total amount of TM memory. At start-up, the TM allocates a fraction of the JVM memory as byte arrays and manages this portion by itself. The remaining memory is used as regular JVM heap for TM and user code.

The purpose of the warning is to tell the user, that the memory configuration might not be optimal. However, this depends of course on the setup environment and should probably be rephrased to make this more clear.

Cheers, Fabian

2015-07-27 11:07 GMT+02:00 Michele Bertoni <[hidden email]>:

I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

thanks

Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <[hidden email]> ha scritto:

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

rmetzger0

Re: Yarn configuration

Hi Michele,

> no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

Okay, so there are only 5 machines available to deploy containers to. The JobManager/ApplicationMaster will also occupy one container.

I guess in EMR they are not running a NodeManager on the master node, so you can not deploy anything there via YARN.

> now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

Flink on YARN can only deploy containers on machines which have a YARN NodeManager running. The JM runs on such a container.

> btw which is a good parameter for number of buffer?

see here for some explanation what they are used for:

http://www.slideshare.net/robertmetzger1/apache-flink-hands-on/37

I would double them until your job runs (as a first approach ;) )

> I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

I answered that question two weeks ago on this list (in the example for 10GB of memory):

Regarding the memory you are able to use in the end:
Initially, you request 10240MB.
From that, we add a 25% safety margin to avoid that YARN is going to kill the JVM.
10240*0.75 = 7680 MB.
So Flink's TaskManager will see 7680 MB when starting up.
Flink's Memory manager is only using 70% of the available heap space for managed memory:
7680*0.7 = 5376 MB.
The safety margin for YARN is very conservative. As Till already said, you can set a different value for the "yarn.heap-cutoff-ratio" (try 0.15) and see if your job still runs.

On Mon, Jul 27, 2015 at 11:29 AM, Michele Bertoni <[hidden email]> wrote:

Hi Fabian, thanks for your reply
so you flink is using about 50% of memory for itself right?

anyway now I am running an EMR with 1 master and 5 core all of them are m3.2xlarge with 8 cores and 30GB of memory

I would like to run flink on yarn with 40 slots on 5 tm with the maximum available resources, what i do is

change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40

run yarn with the command

./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed out of 30GB I don’t know why)

I get an error something like "failed allocating memory after 4/5 container available memory 20992"

I suspect that it is not using the master of the cluster for allocating the jm but using one of the core right? in fact 20992 is exactly 23040-2048

then i run it with 20992

./yarn-session.sh -n 5 -jm 2048 -tm 20992

it succeeds in running 5tm with 40 slots, but when I run a program I always get

Caused by: java.io.IOException: Insufficient number of network buffers: required 40, but only 14 available. The total number of network buffers is currently set to 4096. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers’.

I change the buffers number as robert said from 2048 to 4096 on of my programs run but the second still has same problems

Thanks for help

Best,

michele

Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <[hidden email]> ha scritto:

Hi Michele,

the 10506 MB refer to the size of Flink's managed memory whereas the 20992 MB refer to the total amount of TM memory. At start-up, the TM allocates a fraction of the JVM memory as byte arrays and manages this portion by itself. The remaining memory is used as regular JVM heap for TM and user code.

The purpose of the warning is to tell the user, that the memory configuration might not be optimal. However, this depends of course on the setup environment and should probably be rephrased to make this more clear.

Cheers, Fabian

2015-07-27 11:07 GMT+02:00 Michele Bertoni <[hidden email]>:

I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

thanks

Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <[hidden email]> ha scritto:

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

Michele Bertoni

Re: Yarn configuration

OK thanks Robert you have been very clear now! :)

just one question, more related on emr than to flink, if i cannot run anything on the EMR master, then is it useful to allocate a big machine (8 core, 30GB) on it? I thought it was the jm but it is not

Il giorno 27/lug/2015, alle ore 14:56, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

> no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

Okay, so there are only 5 machines available to deploy containers to. The JobManager/ApplicationMaster will also occupy one container.

I guess in EMR they are not running a NodeManager on the master node, so you can not deploy anything there via YARN.

> now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

Flink on YARN can only deploy containers on machines which have a YARN NodeManager running. The JM runs on such a container.

> btw which is a good parameter for number of buffer?

see here for some explanation what they are used for:

http://www.slideshare.net/robertmetzger1/apache-flink-hands-on/37

I would double them until your job runs (as a first approach ;) )

> I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

I answered that question two weeks ago on this list (in the example for 10GB of memory):

Regarding the memory you are able to use in the end:
Initially, you request 10240MB.
From that, we add a 25% safety margin to avoid that YARN is going to kill the JVM.
10240*0.75 = 7680 MB.
So Flink's TaskManager will see 7680 MB when starting up.
Flink's Memory manager is only using 70% of the available heap space for managed memory:
7680*0.7 = 5376 MB.
The safety margin for YARN is very conservative. As Till already said, you can set a different value for the "yarn.heap-cutoff-ratio" (try 0.15) and see if your job still runs.

On Mon, Jul 27, 2015 at 11:29 AM, Michele Bertoni <[hidden email]> wrote:

Hi Fabian, thanks for your reply
so you flink is using about 50% of memory for itself right?

anyway now I am running an EMR with 1 master and 5 core all of them are m3.2xlarge with 8 cores and 30GB of memory

I would like to run flink on yarn with 40 slots on 5 tm with the maximum available resources, what i do is

change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40

run yarn with the command

./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed out of 30GB I don’t know why)

I get an error something like "failed allocating memory after 4/5 container available memory 20992"

I suspect that it is not using the master of the cluster for allocating the jm but using one of the core right? in fact 20992 is exactly 23040-2048

then i run it with 20992

./yarn-session.sh -n 5 -jm 2048 -tm 20992

it succeeds in running 5tm with 40 slots, but when I run a program I always get

Caused by: java.io.IOException: Insufficient number of network buffers: required 40, but only 14 available. The total number of network buffers is currently set to 4096. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers’.

I change the buffers number as robert said from 2048 to 4096 on of my programs run but the second still has same problems

Thanks for help

Best,

michele

Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <[hidden email]> ha scritto:

Hi Michele,

the 10506 MB refer to the size of Flink's managed memory whereas the 20992 MB refer to the total amount of TM memory. At start-up, the TM allocates a fraction of the JVM memory as byte arrays and manages this portion by itself. The remaining memory is used as regular JVM heap for TM and user code.

The purpose of the warning is to tell the user, that the memory configuration might not be optimal. However, this depends of course on the setup environment and should probably be rephrased to make this more clear.

Cheers, Fabian

2015-07-27 11:07 GMT+02:00 Michele Bertoni <[hidden email]>:

I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

thanks

Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <[hidden email]> ha scritto:

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele

rmetzger0

Re: Yarn configuration

Hi Michele,

I'm happy that you got it to run the way you want.

I guess services such as the HDFS NameNode and YARNs ResourceManager are running on the master.

I don't know what you are doing on the cluster, but I suspect it is for experimentation only. As long as you are not maintaining a huge HDFS installation in the cluster, you don't need a fancy machine for the master.

The documentation [1] of EMR says:

"The master node does not have large computational requirements. For most clusters of 50 or fewer nodes, consider using a m1.small for Hadoop 1 clusters and m1.large for Hadoop 2 clusters. For clusters of more than 50 nodes, consider using an m1.large for Hadoop 1 clusters and m1.xlarge for Hadoop 2 clusters."

The m1.large machines [2] have 7.5 GB and 2 cores.

[1] http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-instances.html

[2] http://aws.amazon.com/ec2/previous-generation/

On Mon, Jul 27, 2015 at 5:19 PM, Michele Bertoni <[hidden email]> wrote:

OK thanks Robert you have been very clear now! :)

just one question, more related on emr than to flink, if i cannot run anything on the EMR master, then is it useful to allocate a big machine (8 core, 30GB) on it? I thought it was the jm but it is not

Il giorno 27/lug/2015, alle ore 14:56, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

> no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

Okay, so there are only 5 machines available to deploy containers to. The JobManager/ApplicationMaster will also occupy one container.

I guess in EMR they are not running a NodeManager on the master node, so you can not deploy anything there via YARN.

> now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

Flink on YARN can only deploy containers on machines which have a YARN NodeManager running. The JM runs on such a container.

> btw which is a good parameter for number of buffer?

see here for some explanation what they are used for:

http://www.slideshare.net/robertmetzger1/apache-flink-hands-on/37

I would double them until your job runs (as a first approach ;) )

> I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

I answered that question two weeks ago on this list (in the example for 10GB of memory):

Regarding the memory you are able to use in the end:
Initially, you request 10240MB.
From that, we add a 25% safety margin to avoid that YARN is going to kill the JVM.
10240*0.75 = 7680 MB.
So Flink's TaskManager will see 7680 MB when starting up.
Flink's Memory manager is only using 70% of the available heap space for managed memory:
7680*0.7 = 5376 MB.
The safety margin for YARN is very conservative. As Till already said, you can set a different value for the "yarn.heap-cutoff-ratio" (try 0.15) and see if your job still runs.

On Mon, Jul 27, 2015 at 11:29 AM, Michele Bertoni <[hidden email]> wrote:

Hi Fabian, thanks for your reply
so you flink is using about 50% of memory for itself right?

anyway now I am running an EMR with 1 master and 5 core all of them are m3.2xlarge with 8 cores and 30GB of memory

I would like to run flink on yarn with 40 slots on 5 tm with the maximum available resources, what i do is

change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40

run yarn with the command

./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed out of 30GB I don’t know why)

I get an error something like "failed allocating memory after 4/5 container available memory 20992"

I suspect that it is not using the master of the cluster for allocating the jm but using one of the core right? in fact 20992 is exactly 23040-2048

then i run it with 20992

./yarn-session.sh -n 5 -jm 2048 -tm 20992

it succeeds in running 5tm with 40 slots, but when I run a program I always get

Caused by: java.io.IOException: Insufficient number of network buffers: required 40, but only 14 available. The total number of network buffers is currently set to 4096. You can increase this number by setting the configuration key 'taskmanager.network.numberOfBuffers’.

I change the buffers number as robert said from 2048 to 4096 on of my programs run but the second still has same problems

Thanks for help

Best,

michele

Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <[hidden email]> ha scritto:

Hi Michele,

the 10506 MB refer to the size of Flink's managed memory whereas the 20992 MB refer to the total amount of TM memory. At start-up, the TM allocates a fraction of the JVM memory as byte arrays and manages this portion by itself. The remaining memory is used as regular JVM heap for TM and user code.

The purpose of the warning is to tell the user, that the memory configuration might not be optimal. However, this depends of course on the setup environment and should probably be rephrased to make this more clear.

Cheers, Fabian

2015-07-27 11:07 GMT+02:00 Michele Bertoni <[hidden email]>:

I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992

thanks

Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <[hidden email]> ha scritto:

Hi Robert,

thanks for answering, today I have been able to try again: no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload

the total amount of memory is 112.5GB that is actually 22.5 for each of the 5

now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm

btw which is a good parameter for number of buffer?

thanks,

Best

michele

Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <[hidden email]> ha scritto:

Hi Michele,

configuring a YARN cluster to allocate all available resources as good as possible is sometimes tricky, that is true.

We are aware of these problems and there are actually the following two JIRAs for this:

https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client to allocate all cluster resources, if no argument given) --> I think the consensus on the issue was give users an option to allocate everything (so don't do it by default)

https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster sometimes fails to allocate the specified number of workers)

How many NodeManager's is YARN reporting in the ResourceManager UI? (in "Active Nodes" column) (I suspect 6?)

How much memory per NodeManager is YARN reporting? (You can see this in the "Nodes" page of the RM)

> I would like to run 5 nodes with 8 slots each, is it correct?

Yes.

> Then i reduced memories, everything started but i get a runtime error of missing buffer

What exactly is the exception?

I guess you have to give the system a few more network buffers using the taskmanager.network.numberOfBuffers config parameter.

> Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

When starting Flink on YARN, there are usually some WARN log messages in the beginning when the system detects that specified containers will not fit in the cluster.

Also, in the ResourceManager UI, you can see the status of the scheduler. This often helps to understand what's going on, resource-wise.

On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <[hidden email]> wrote:

Hi everybody, i need a help on how to configure a yarn cluster
I tried a lot of conf but none of them was correct

We have a cluster on amazon emr let's say 1manager+5worker all of them are m3.2xlarge then 8 core each and 30 GB of RAM each

What is a good configuration for such cluster?

I would like to run 5 nodes with 8 slots each, is it correct?

Now the problems: by now i run all tests mistakenly using 40 task managers each with 2048MB and 1 slot (at least it was working)

Today i found the error and i tried run 5 task manager and setting a default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm 23040) that is the limit allowed by yarn but i am getting errors: one TM is not running because there is no available memory. it seems like the jm is not using memory from the master but from the nodes (in fact yarn says TM number 5 is missing 2048 that is the memory for the jm)

Then i reduced memories, everything started but i get a runtime error of missing buffer

Can someone help me syep-by-step in a good configuration for such cluster? I think the documentation is really missing details

Thanks a lot
Best
Michele