Hi all,
I’m running jobs on EMR via YARN, and wondering how to check exactly what configuration settings are actually being used. This is mostly for the TaskManager. I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can use -yD param=value. But my experience with Hadoop makes me want to see the exact values being used, versus assuming I know what’s been set :) Thanks, — Ken -------------------------- Ken Krugler +1 530-210-6378 custom big data solutions & training Hadoop, Cascading, Cassandra & Solr |
If you're talking about parameters that were set on JVM startup then `ps aux|grep flink` on an EMR slave node should do the trick, that'll give you the full command line. On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <[hidden email]> wrote:
|
Hi Timur,
Maybe there’s no good reason to worry, but in Hadoop land you can have parameters set via the conf on the client, which in turn get overridden by values from conf files on the nodes, which you can then override via command line parameters, which in turn can be changed by the user code. Plus parameters that can be flagged as final/unmodifiable, and thus some of the above actually don’t change anything. So it’s a common issue where what you think you set as a value isn’t actually being used, and that’s why examining the job conf that was actually deployed with tasks is critical. — Ken
-------------------------- Ken Krugler +1 530-210-6378 custom big data solutions & training Hadoop, Cascading, Cassandra & Solr |
Hi Ken,
When you're running Yarn, the Flink configuration is created once and shared among all nodes (JobManager and TaskManagers). Please have a look at the JobManager tab on the web interface. It shows you the configuration. Cheers, Max On Fri, Apr 29, 2016 at 3:18 PM, Ken Krugler <[hidden email]> wrote: > Hi Timur, > > On Apr 28, 2016, at 10:40pm, Timur Fayruzov <[hidden email]> > wrote: > > If you're talking about parameters that were set on JVM startup then `ps > aux|grep flink` on an EMR slave node should do the trick, that'll give you > the full command line. > > > No, I’m talking about values that come from flink-conf.yaml. > > Maybe there’s no good reason to worry, but in Hadoop land you can have > parameters set via the conf on the client, which in turn get overridden by > values from conf files on the nodes, which you can then override via command > line parameters, which in turn can be changed by the user code. > > Plus parameters that can be flagged as final/unmodifiable, and thus some of > the above actually don’t change anything. > > So it’s a common issue where what you think you set as a value isn’t > actually being used, and that’s why examining the job conf that was actually > deployed with tasks is critical. > > — Ken > > > > On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <[hidden email]> > wrote: >> >> Hi all, >> >> I’m running jobs on EMR via YARN, and wondering how to check exactly what >> configuration settings are actually being used. >> >> This is mostly for the TaskManager. >> >> I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can >> use -yD param=value. >> >> But my experience with Hadoop makes me want to see the exact values being >> used, versus assuming I know what’s been set :) >> >> Thanks, >> >> — Ken > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > |
Hi Max,
I’m running a job using ./bin/flink run, with parameters: -ytm 20000 \ -yjm 2048 \ -ys 4 \ -p 10 \ -yD taskmanager.network.numberOfBuffers=3000 \ -yD taskmanager.memory.off-heap=true Here’s a screenshot from the JobManager: If that doesn’t come through, it’s showing: job manager.heap.mb 256 taskmanager.heap.mb 512 taskmanager.memory.off-heap true taskmanager.network.numberOfBuffers 3000 taskmanager.numberOfTaskSlots 1 So numberOfBuffers seems right, same with memory.off-heap. But taskmanager.heap.mb looks like a default value, same for numberOfTaskSlots and jobmanager.heap.mb When I look at my actual job, the settings I’m seeing for number of slots (as an example) match what I’m specifying from the command line. When I look at the JobManager logs, I see -Xmx1448M, which I guess is an approximation of the 2048 I specified. And when I look at the TaskManager logs, the JVM settings match what I’d expect (for -ytm 20000, so 15GB direct, and about 5GB for the JVM). 2016-05-05 01:07:16,161 INFO org.apache.flink.yarn.YarnTaskManagerRunner - JVM Options: 2016-05-05 01:07:16,161 INFO org.apache.flink.yarn.YarnTaskManagerRunner - -Xms4500m 2016-05-05 01:07:16,161 INFO org.apache.flink.yarn.YarnTaskManagerRunner - -Xmx4500m 2016-05-05 01:07:16,161 INFO org.apache.flink.yarn.YarnTaskManagerRunner - -XX:MaxDirectMemorySize=15000m So I guess I’ve got two questions… 1. What is the meaning of the values I’m seeing in the JobManager UI. 2. How do I figure out what the TaskManager is getting for -yD taskmanager.tmp.dirs, as an example. Thanks, — Ken
-------------------------- Ken Krugler +1 530-210-6378 custom big data solutions & training Hadoop, Cascading, Cassandra & Solr |
Free forum by Nabble | Edit this page |