Checking actual config values used by TaskManager

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Checking actual config values used by TaskManager

Ken Krugler
Hi all,

I’m running jobs on EMR via YARN, and wondering how to check exactly what configuration settings are actually being used.

This is mostly for the TaskManager.

I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can use -yD param=value.

But my experience with Hadoop makes me want to see the exact values being used, versus assuming I know what’s been set :)

Thanks,

— Ken


--------------------------
Ken Krugler
+1 530-210-6378
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply | Threaded
Open this post in threaded view
|

Re: Checking actual config values used by TaskManager

Timur Fayruzov
If you're talking about parameters that were set on JVM startup then `ps aux|grep flink` on an EMR slave node should do the trick, that'll give you the full command line.

On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <[hidden email]> wrote:
Hi all,

I’m running jobs on EMR via YARN, and wondering how to check exactly what configuration settings are actually being used.

This is mostly for the TaskManager.

I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can use -yD param=value.

But my experience with Hadoop makes me want to see the exact values being used, versus assuming I know what’s been set :)

Thanks,

— Ken


--------------------------
Ken Krugler
<a href="tel:%2B1%20530-210-6378" value="+15302106378" target="_blank">+1 530-210-6378
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Reply | Threaded
Open this post in threaded view
|

Re: Checking actual config values used by TaskManager

Ken Krugler
Hi Timur,

On Apr 28, 2016, at 10:40pm, Timur Fayruzov <[hidden email]> wrote:

If you're talking about parameters that were set on JVM startup then `ps aux|grep flink` on an EMR slave node should do the trick, that'll give you the full command line.

No, I’m talking about values that come from flink-conf.yaml.

Maybe there’s no good reason to worry, but in Hadoop land you can have parameters set via the conf on the client, which in turn get overridden by values from conf files on the nodes, which you can then override via command line parameters, which in turn can be changed by the user code.

Plus parameters that can be flagged as final/unmodifiable, and thus some of the above actually don’t change anything.

So it’s a common issue where what you think you set as a value isn’t actually being used, and that’s why examining the job conf that was actually deployed with tasks is critical.

— Ken



On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <[hidden email]> wrote:
Hi all,

I’m running jobs on EMR via YARN, and wondering how to check exactly what configuration settings are actually being used.

This is mostly for the TaskManager.

I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can use -yD param=value.

But my experience with Hadoop makes me want to see the exact values being used, versus assuming I know what’s been set :)

Thanks,

— Ken

--------------------------
Ken Krugler
+1 530-210-6378
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply | Threaded
Open this post in threaded view
|

Re: Checking actual config values used by TaskManager

Maximilian Michels
Hi Ken,

When you're running Yarn, the Flink configuration is created once and
shared among all nodes (JobManager and TaskManagers). Please have a
look at the JobManager tab on the web interface. It shows you the
configuration.

Cheers,
Max

On Fri, Apr 29, 2016 at 3:18 PM, Ken Krugler
<[hidden email]> wrote:

> Hi Timur,
>
> On Apr 28, 2016, at 10:40pm, Timur Fayruzov <[hidden email]>
> wrote:
>
> If you're talking about parameters that were set on JVM startup then `ps
> aux|grep flink` on an EMR slave node should do the trick, that'll give you
> the full command line.
>
>
> No, I’m talking about values that come from flink-conf.yaml.
>
> Maybe there’s no good reason to worry, but in Hadoop land you can have
> parameters set via the conf on the client, which in turn get overridden by
> values from conf files on the nodes, which you can then override via command
> line parameters, which in turn can be changed by the user code.
>
> Plus parameters that can be flagged as final/unmodifiable, and thus some of
> the above actually don’t change anything.
>
> So it’s a common issue where what you think you set as a value isn’t
> actually being used, and that’s why examining the job conf that was actually
> deployed with tasks is critical.
>
> — Ken
>
>
>
> On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <[hidden email]>
> wrote:
>>
>> Hi all,
>>
>> I’m running jobs on EMR via YARN, and wondering how to check exactly what
>> configuration settings are actually being used.
>>
>> This is mostly for the TaskManager.
>>
>> I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can
>> use -yD param=value.
>>
>> But my experience with Hadoop makes me want to see the exact values being
>> used, versus assuming I know what’s been set :)
>>
>> Thanks,
>>
>> — Ken
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Checking actual config values used by TaskManager

Ken Krugler
Hi Max,

On May 2, 2016, at 4:43am, Maximilian Michels <[hidden email]> wrote:

Hi Ken,

When you're running Yarn, the Flink configuration is created once and
shared among all nodes (JobManager and TaskManagers). Please have a
look at the JobManager tab on the web interface. It shows you the
configuration.

I’ve seen that, but the values displayed don’t match what I’m setting, or what I see in the logs.

I’m running a job using ./bin/flink run, with parameters:

-ytm 20000 \
-yjm 2048 \
-ys 4 \
-p 10 \
-yD taskmanager.network.numberOfBuffers=3000 \
-yD taskmanager.memory.off-heap=true

Here’s a screenshot from the JobManager:


If that doesn’t come through, it’s showing:

job manager.heap.mb 256
taskmanager.heap.mb 512
taskmanager.memory.off-heap true
taskmanager.network.numberOfBuffers 3000
taskmanager.numberOfTaskSlots 1

So numberOfBuffers seems right, same with memory.off-heap.

But taskmanager.heap.mb looks like a default value, same for numberOfTaskSlots and jobmanager.heap.mb

When I look at my actual job, the settings I’m seeing for number of slots (as an example) match what I’m specifying from the command line.

When I look at the JobManager logs, I see -Xmx1448M, which I guess is an approximation of the 2048 I specified.

And when I look at the TaskManager logs, the JVM settings match what I’d expect (for -ytm 20000, so 15GB direct, and about 5GB for the JVM).
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -  JVM Options:
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -     -Xms4500m
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -     -Xmx4500m
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -     -XX:MaxDirectMemorySize=15000m
So I guess I’ve got two questions…

1. What is the meaning of the values I’m seeing in the JobManager UI.

2. How do I figure out what the TaskManager is getting for -yD taskmanager.tmp.dirs, as an example.

Thanks,

— Ken

On Fri, Apr 29, 2016 at 3:18 PM, Ken Krugler
<[hidden email]> wrote:
Hi Timur,

On Apr 28, 2016, at 10:40pm, Timur Fayruzov <[hidden email]>
wrote:

If you're talking about parameters that were set on JVM startup then `ps
aux|grep flink` on an EMR slave node should do the trick, that'll give you
the full command line.


No, I’m talking about values that come from flink-conf.yaml.

Maybe there’s no good reason to worry, but in Hadoop land you can have
parameters set via the conf on the client, which in turn get overridden by
values from conf files on the nodes, which you can then override via command
line parameters, which in turn can be changed by the user code.

Plus parameters that can be flagged as final/unmodifiable, and thus some of
the above actually don’t change anything.

So it’s a common issue where what you think you set as a value isn’t
actually being used, and that’s why examining the job conf that was actually
deployed with tasks is critical.

— Ken



On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <[hidden email]>
wrote:

Hi all,

I’m running jobs on EMR via YARN, and wondering how to check exactly what
configuration settings are actually being used.

This is mostly for the TaskManager.

I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can
use -yD param=value.

But my experience with Hadoop makes me want to see the exact values being
used, versus assuming I know what’s been set :)

Thanks,

— Ken


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




--------------------------
Ken Krugler
+1 530-210-6378
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr