(DEPRECATED) Apache Flink User Mailing List archive.

All but one TMs connect when JM has more than 16G of memory

Classic

List

Threaded

10 messages Options

Robert Schmidtke

All but one TMs connect when JM has more than 16G of memory

It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....

TaskManager status (6/7)

.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....

16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:

16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M

.....

but then continues to report

.....

16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing

16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing

16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing

.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert

My GPG Key ID: 336E2680

Robert Schmidtke

Re: All but one TMs connect when JM has more than 16G of memory

I should say I'm running the current Flink master branch.

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:

It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

My GPG Key ID: 336E2680

rmetzger0

Re: All but one TMs connect when JM has more than 16G of memory

In reply to this post by Robert Schmidtke

Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.

I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.

Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:

It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

Robert Schmidtke

Re: All but one TMs connect when JM has more than 16G of memory

Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers

Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:

Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

My GPG Key ID: 336E2680

rmetzger0

Re: All but one TMs connect when JM has more than 16G of memory

Hi,

It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop.

is it a "hard error" (failing) you're getting or just "WARN" log messages. I'm asking because I've added some code some time ago to do some checks before deploying Flink on YARN. These checks will print WARN log messages if the requested YARN session/job does not fit onto the cluster.

This "endless loop" exists because in many production environments Flink can just wait for resources to become available, for example when other containers are finishing.

Robert

On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <[hidden email]> wrote:

Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers
Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:
Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

Robert Schmidtke

Re: All but one TMs connect when JM has more than 16G of memory

Hi Robert,

I had a job failure yesterday with what I believe is the setup I have described above. However when trying to reproduce now, the behavior is the same: Flink waiting for resources to become available. So no hard error.

Ok, the looping makes sense then. I haven't thought about shared setups. I'm still figuring out how all parameters play together, i.e. -yn, -yjm, -ytm and the memory limits in yarn-site.xml. This will need some testing and I'll come back with a proper description once I think I know what's going on.

When running Flink on YARN, is it easily possible to place the Flink JM where the YARN Resource Manager sits, and all the TMs with the remaining Node Managers?

Robert

On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <[hidden email]> wrote:

Hi,

It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop.

is it a "hard error" (failing) you're getting or just "WARN" log messages. I'm asking because I've added some code some time ago to do some checks before deploying Flink on YARN. These checks will print WARN log messages if the requested YARN session/job does not fit onto the cluster.
This "endless loop" exists because in many production environments Flink can just wait for resources to become available, for example when other containers are finishing.

Robert

On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers
Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:
Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

My GPG Key ID: 336E2680

rmetzger0

Re: All but one TMs connect when JM has more than 16G of memory

Hi,

there is currently no option for forcing certain containers onto specific machines.

For running the JM (or any other YARN container) on the AM host, you first need to have a NodeManager running on the host with the RM. Maybe YARN is smart enough to schedule the small JM container onto that machine.

I don't know your exact setup, but maybe it would make sense for you to run Flink in the standalone cluster mode instead with YARN. It seems that you have a very good idea how and where you want to run the Flink services in your cluster. YARN is designed to be an abstraction between the cluster and the application, that's why its a bit difficult to schedule the containers to specific machines.

Robert

On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke <[hidden email]> wrote:

Hi Robert,

I had a job failure yesterday with what I believe is the setup I have described above. However when trying to reproduce now, the behavior is the same: Flink waiting for resources to become available. So no hard error.

Ok, the looping makes sense then. I haven't thought about shared setups. I'm still figuring out how all parameters play together, i.e. -yn, -yjm, -ytm and the memory limits in yarn-site.xml. This will need some testing and I'll come back with a proper description once I think I know what's going on.

When running Flink on YARN, is it easily possible to place the Flink JM where the YARN Resource Manager sits, and all the TMs with the remaining Node Managers?

Robert

On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <[hidden email]> wrote:
Hi,

It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop.

is it a "hard error" (failing) you're getting or just "WARN" log messages. I'm asking because I've added some code some time ago to do some checks before deploying Flink on YARN. These checks will print WARN log messages if the requested YARN session/job does not fit onto the cluster.
This "endless loop" exists because in many production environments Flink can just wait for resources to become available, for example when other containers are finishing.

Robert

On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers
Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:
Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

Robert Schmidtke

Re: All but one TMs connect when JM has more than 16G of memory

I see, thanks for the info. I only have access to my cluster via SLURM and we don't have ssh between our nodes which is why I haven't really considered the Standalone mode. A colleague has set up YARN on SLURM and it was just the easiest to use. I briefly looked into the Flink Standalone mode but dropped it because I thought YARN would be possible after all. It seems I'm going to have a deeper look into starting the master and slaves with SLURM's srun instead of ssh (I guess a slight modification of start-cluster.sh should do the job).

On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger <[hidden email]> wrote:

Hi,
there is currently no option for forcing certain containers onto specific machines.
For running the JM (or any other YARN container) on the AM host, you first need to have a NodeManager running on the host with the RM. Maybe YARN is smart enough to schedule the small JM container onto that machine.

I don't know your exact setup, but maybe it would make sense for you to run Flink in the standalone cluster mode instead with YARN. It seems that you have a very good idea how and where you want to run the Flink services in your cluster. YARN is designed to be an abstraction between the cluster and the application, that's why its a bit difficult to schedule the containers to specific machines.

Robert

On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

I had a job failure yesterday with what I believe is the setup I have described above. However when trying to reproduce now, the behavior is the same: Flink waiting for resources to become available. So no hard error.

Ok, the looping makes sense then. I haven't thought about shared setups. I'm still figuring out how all parameters play together, i.e. -yn, -yjm, -ytm and the memory limits in yarn-site.xml. This will need some testing and I'll come back with a proper description once I think I know what's going on.

When running Flink on YARN, is it easily possible to place the Flink JM where the YARN Resource Manager sits, and all the TMs with the remaining Node Managers?

Robert

On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <[hidden email]> wrote:
Hi,

It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop.

is it a "hard error" (failing) you're getting or just "WARN" log messages. I'm asking because I've added some code some time ago to do some checks before deploying Flink on YARN. These checks will print WARN log messages if the requested YARN session/job does not fit onto the cluster.
This "endless loop" exists because in many production environments Flink can just wait for resources to become available, for example when other containers are finishing.

Robert

On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers
Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:
Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

My GPG Key ID: 336E2680

rmetzger0

Re: All but one TMs connect when JM has more than 16G of memory

Feel free to contribute a documentation to Flink on how to run Flink on SLURM.

On Thu, Oct 1, 2015 at 11:45 AM, Robert Schmidtke <[hidden email]> wrote:

I see, thanks for the info. I only have access to my cluster via SLURM and we don't have ssh between our nodes which is why I haven't really considered the Standalone mode. A colleague has set up YARN on SLURM and it was just the easiest to use. I briefly looked into the Flink Standalone mode but dropped it because I thought YARN would be possible after all. It seems I'm going to have a deeper look into starting the master and slaves with SLURM's srun instead of ssh (I guess a slight modification of start-cluster.sh should do the job).

On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger <[hidden email]> wrote:
Hi,
there is currently no option for forcing certain containers onto specific machines.
For running the JM (or any other YARN container) on the AM host, you first need to have a NodeManager running on the host with the RM. Maybe YARN is smart enough to schedule the small JM container onto that machine.

I don't know your exact setup, but maybe it would make sense for you to run Flink in the standalone cluster mode instead with YARN. It seems that you have a very good idea how and where you want to run the Flink services in your cluster. YARN is designed to be an abstraction between the cluster and the application, that's why its a bit difficult to schedule the containers to specific machines.

Robert

On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

I had a job failure yesterday with what I believe is the setup I have described above. However when trying to reproduce now, the behavior is the same: Flink waiting for resources to become available. So no hard error.

Ok, the looping makes sense then. I haven't thought about shared setups. I'm still figuring out how all parameters play together, i.e. -yn, -yjm, -ytm and the memory limits in yarn-site.xml. This will need some testing and I'll come back with a proper description once I think I know what's going on.

When running Flink on YARN, is it easily possible to place the Flink JM where the YARN Resource Manager sits, and all the TMs with the remaining Node Managers?

Robert

On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <[hidden email]> wrote:
Hi,

It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop.

is it a "hard error" (failing) you're getting or just "WARN" log messages. I'm asking because I've added some code some time ago to do some checks before deploying Flink on YARN. These checks will print WARN log messages if the requested YARN session/job does not fit onto the cluster.
This "endless loop" exists because in many production environments Flink can just wait for resources to become available, for example when other containers are finishing.

Robert

On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers
Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:
Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

Robert Schmidtke

Re: All but one TMs connect when JM has more than 16G of memory

So for anyone who is interested, here are some code references for getting started with Flink on Slurm.

I added basic start and stop scripts for Flink on Slurm in my fork:

https://github.com/robert-schmidtke/flink/tree/flink-slurm/flink-dist/src/main/flink-bin/bin

And I also created an example of how to configure and run it:

https://github.com/robert-schmidtke/flink-slurm/blob/master/flink-slurm-example.sh

I'm not sure I will add much more effort because it works for my setup right now. However if there's a wider interest I can add a bit more documentation and insight.

Robert

On Thu, Oct 1, 2015 at 11:51 AM, Robert Metzger <[hidden email]> wrote:

Feel free to contribute a documentation to Flink on how to run Flink on SLURM.

On Thu, Oct 1, 2015 at 11:45 AM, Robert Schmidtke <[hidden email]> wrote:
I see, thanks for the info. I only have access to my cluster via SLURM and we don't have ssh between our nodes which is why I haven't really considered the Standalone mode. A colleague has set up YARN on SLURM and it was just the easiest to use. I briefly looked into the Flink Standalone mode but dropped it because I thought YARN would be possible after all. It seems I'm going to have a deeper look into starting the master and slaves with SLURM's srun instead of ssh (I guess a slight modification of start-cluster.sh should do the job).

On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger <[hidden email]> wrote:
Hi,
there is currently no option for forcing certain containers onto specific machines.
For running the JM (or any other YARN container) on the AM host, you first need to have a NodeManager running on the host with the RM. Maybe YARN is smart enough to schedule the small JM container onto that machine.

I don't know your exact setup, but maybe it would make sense for you to run Flink in the standalone cluster mode instead with YARN. It seems that you have a very good idea how and where you want to run the Flink services in your cluster. YARN is designed to be an abstraction between the cluster and the application, that's why its a bit difficult to schedule the containers to specific machines.

Robert

On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

I had a job failure yesterday with what I believe is the setup I have described above. However when trying to reproduce now, the behavior is the same: Flink waiting for resources to become available. So no hard error.

Ok, the looping makes sense then. I haven't thought about shared setups. I'm still figuring out how all parameters play together, i.e. -yn, -yjm, -ytm and the memory limits in yarn-site.xml. This will need some testing and I'll come back with a proper description once I think I know what's going on.

When running Flink on YARN, is it easily possible to place the Flink JM where the YARN Resource Manager sits, and all the TMs with the remaining Node Managers?

Robert

On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <[hidden email]> wrote:
Hi,

It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop.

is it a "hard error" (failing) you're getting or just "WARN" log messages. I'm asking because I've added some code some time ago to do some checks before deploying Flink on YARN. These checks will print WARN log messages if the requested YARN session/job does not fit onto the cluster.
This "endless loop" exists because in many production environments Flink can just wait for resources to become available, for example when other containers are finishing.

Robert

On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke <[hidden email]> wrote:
Hi Robert,

thanks for your reply. It got me digging into my setup and I discovered that one TM was scheduled next to the JM. When specifying -yn 7 the documentation suggests that this is the number of TMs (of which I wanted 7), and I thought an additional container would be used for the JM (my YARN cluster has 8 containers). Anyway with this setup the memory added up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard maximum of 56G in my yarn-site.xml which is why the request could not be fulfilled. It is interesting to note that when I set both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an error but the aforementioned endless loop. Note I have yarn.nodemanager.vmem-check-enabled set to false. This is probably a YARN issue then / my bad configuration.

I'm in a rush now (to get to the Flink meetup) and thus will check the documentation later to see how to deploy the TMs and JM on separate machines each, since that is not what's happening at the moment, but this is what I'd like to have. Thanks again and see you in an hour.

Cheers
Robert

On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <[hidden email]> wrote:
Hi Robert,

the problem here is that YARN's scheduler (there are different schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's ApplicationMaster/JobManager all the containers it is requesting. By increasing the size of the AM/JM container, there is probably no memory left to fit the last TaskManager container.
I also experienced this issue, when I wanted to run a Flink job on YARN and the containers were fitting theoretically, but YARN was not giving me all the containers I requested.
Back then, I asked on the yarn-dev list [1] (there were also some off-list emails) but we could not resolve the issue.

Can you check the resource manager logs? Maybe there is a log message which explains why the container request of Flink's AM is not fulfilled.

[1] http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster

On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke <[hidden email]> wrote:
It's me again. This is a strange issue, I hope I managed to find the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of memory each.

When running my job like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .....

The job completes without any problems. When running it like so:

$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn 7 .....

(note the one more M of memory for the JM), the execution stalls, continuously reporting:

.....
TaskManager status (6/7)
TaskManager status (6/7)
TaskManager status (6/7)
.....

I did some poking around, but I couldn't find any direct correlation with the code.

The JM log says:

.....
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - JVM Options:
16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ - -Xmx12289M
.....

but then continues to report

.....
16:52:59,311 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:52:59,831 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
16:53:00,351 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The user requested 7 containers, 6 running. 1 containers missing
.....

forever until I cancel the job.

If you have any ideas I'm happy to try them out. Thanks in advance for any hints! Cheers.

Robert
--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

--
My GPG Key ID: 336E2680

My GPG Key ID: 336E2680