flink on yarn configuration

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

flink on yarn configuration

Pa Rö
hello community,

i want run my flink app on a cluster (cloudera 5.4.4) with 3 nodes (one pc has i7 8core with 16GB RAM). now i want submit my spark job on yarn (20GB RAM).

my script to deploy the flink cluster on yarn:

export HADOOP_CONF_DIR=/etc/hadoop/conf/
./flink-0.9.0/bin/yarn-session.sh -n 1 -jm 10240 -tm 10240

my script to submit the job is to time the following:

./flink-0.9.0/bin/flink run /home/marcel/Desktop/ma-flink.jar

in the flink dashbord shown are only 5GB memory used for computed my job?

maybe my configuration is not the optimal??

best regards,
paul
Reply | Threaded
Open this post in threaded view
|

Re: flink on yarn configuration

Till Rohrmann

Hi Paul,

when you run your Flink cluster with YARN then we cannot give the full amount of the allocated container memory to Flink. The reason is that YARN itself needs some of the memory as well. Since YARN is quite strict with containers which exceed their memory limit (the container is instantly killed), we assign per default 0.25 of the container’s memory to YARN.

I cannot tell why in your case it is 0.5. Maybe you’re using an old version of Flink. But you can control the memory fraction which is given to Yarn using the configuration parameter yarn.heap-cutoff-ratio.

Cheers,
Till


On Tue, Jul 14, 2015 at 10:47 AM, Pa Rö <[hidden email]> wrote:
hello community,

i want run my flink app on a cluster (cloudera 5.4.4) with 3 nodes (one pc has i7 8core with 16GB RAM). now i want submit my spark job on yarn (20GB RAM).

my script to deploy the flink cluster on yarn:

export HADOOP_CONF_DIR=/etc/hadoop/conf/
./flink-0.9.0/bin/yarn-session.sh -n 1 -jm 10240 -tm 10240

my script to submit the job is to time the following:

./flink-0.9.0/bin/flink run /home/marcel/Desktop/ma-flink.jar

in the flink dashbord shown are only 5GB memory used for computed my job?

maybe my configuration is not the optimal??

best regards,
paul

Reply | Threaded
Open this post in threaded view
|

Re: flink on yarn configuration

rmetzger0
In reply to this post by Pa Rö
Hi Paul,

I don't think you need 10 GB of heap space for the JobManager. Usually 1 GB are sufficient.
Since you have 3 nodes, I would start Flink with 3 task managers.
I think you can also launch such a cluster:
./flink-0.9.0/bin/yarn-session.sh -n 3 -jm 1024 -tm 13000

Regarding the memory you are able to use in the end:
Initially, you request 10240MB.
From that, we use add a 25% safety margin to avoid that YARN is going to kill the JVM.
10240*0.75 = 7680 MB.
So Flink's TaskManager will see 7680 MB when starting up.
Flink's Memory manager is only using 70% of the available heap space for managed memory:
7680*0.7 = 5376 MB.

The safety margin for YARN is very conservative. As Till already said, you can set a different value for the "yarn.heap-cutoff-ratio" (try 0.15) and see if your job still runs.


On Tue, Jul 14, 2015 at 10:47 AM, Pa Rö <[hidden email]> wrote:
hello community,

i want run my flink app on a cluster (cloudera 5.4.4) with 3 nodes (one pc has i7 8core with 16GB RAM). now i want submit my spark job on yarn (20GB RAM).

my script to deploy the flink cluster on yarn:

export HADOOP_CONF_DIR=/etc/hadoop/conf/
./flink-0.9.0/bin/yarn-session.sh -n 1 -jm 10240 -tm 10240

my script to submit the job is to time the following:

./flink-0.9.0/bin/flink run /home/marcel/Desktop/ma-flink.jar

in the flink dashbord shown are only 5GB memory used for computed my job?

maybe my configuration is not the optimal??

best regards,
paul