JMX stats reporter with all task manager/job manager stats aggregated?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

JMX stats reporter with all task manager/job manager stats aggregated?

Ajay Tripathy
Hi, I'm running flink jobmanagers/taskmanagers with yarn. I've turned on the JMX reporter in my flink-conf.yaml as follows:

metrics.reporters: jmx

metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter


I was wondering:

Is there a JMX server with the aggregated stats across all jobs / tasks? If so, where is it located? It appears that a JMX starts for every single taskmanager and the jobmanagers do not have the data reported from the taskmanagers.


I'm not sure if this is related, but when I try to specify a port for the jmx reporter, like this:

metrics.reporter.jmx.port: 8789

I'm receiving an error where JMX servers from different task managers fight for that port, and fail to start.

Reply | Threaded
Open this post in threaded view
|

Re: JMX stats reporter with all task manager/job manager stats aggregated?

Ajay Tripathy
Sorry: neglected to include the stack trace for JMX failing to instantiate from a taskmanager:

017-08-05 00:59:09,388 INFO  org.apache.flink.runtime.metrics.MetricRegistry               - Configuring JMXReporter with {port=8789, class=org.apache.flink.metrics.jmx.JMXReporter}.
2017-08-05 00:59:09,402 ERROR org.apache.flink.runtime.metrics.MetricRegistry               - Could not instantiate metrics reporter jmx. Metrics might not be exposed/reported.
java.lang.RuntimeException: Could not start JMX server on any configured port. Ports: 8789
	at org.apache.flink.metrics.jmx.JMXReporter.open(JMXReporter.java:127)
	at org.apache.flink.runtime.metrics.MetricRegistry.<init>(MetricRegistry.java:120)
	at org.apache.flink.runtime.taskmanager.TaskManager$.createTaskManagerComponents(TaskManager.scala:2114)
	at org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1873)
	at org.apache.flink.runtime.taskmanager.TaskManager$.runTaskManager(TaskManager.scala:1769)
	at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1637)
	at org.apache.flink.runtime.taskmanager.TaskManager.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala)
	at org.apache.flink.yarn.YarnTaskManagerRunner$1.call(YarnTaskManagerRunner.java:146)
	at org.apache.flink.yarn.YarnTaskManagerRunner$1.call(YarnTaskManagerRunner.java:142)
	at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
	at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:142)
	at org.apache.flink.yarn.YarnTaskManager$.main(YarnTaskManager.scala:64)
	at org.apache.flink.yarn.YarnTaskManager.main(YarnTaskManager.scala)

On Fri, Aug 4, 2017 at 3:51 PM, Ajay Tripathy <[hidden email]> wrote:
Hi, I'm running flink jobmanagers/taskmanagers with yarn. I've turned on the JMX reporter in my flink-conf.yaml as follows:

metrics.reporters: jmx

metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter


I was wondering:

Is there a JMX server with the aggregated stats across all jobs / tasks? If so, where is it located? It appears that a JMX starts for every single taskmanager and the jobmanagers do not have the data reported from the taskmanagers.


I'm not sure if this is related, but when I try to specify a port for the jmx reporter, like this:

metrics.reporter.jmx.port: 8789

I'm receiving an error where JMX servers from different task managers fight for that port, and fail to start.


Reply | Threaded
Open this post in threaded view
|

Re: JMX stats reporter with all task manager/job manager stats aggregated?

Chesnay Schepler
Hello,

there is no central place where JMX metrics are aggregated.

You can configure a port range for the reporter to prevent port conflicts on the same machine.

metrics.reporter.jmx.port:8789-8790

You can find out which port was used by checking the logs.

Regards,
Chesnay

On 05.08.2017 03:06, Ajay Tripathy wrote:
Sorry: neglected to include the stack trace for JMX failing to instantiate from a taskmanager:

017-08-05 00:59:09,388 INFO  org.apache.flink.runtime.metrics.MetricRegistry               - Configuring JMXReporter with {port=8789, class=org.apache.flink.metrics.jmx.JMXReporter}.
2017-08-05 00:59:09,402 ERROR org.apache.flink.runtime.metrics.MetricRegistry               - Could not instantiate metrics reporter jmx. Metrics might not be exposed/reported.
java.lang.RuntimeException: Could not start JMX server on any configured port. Ports: 8789
	at org.apache.flink.metrics.jmx.JMXReporter.open(JMXReporter.java:127)
	at org.apache.flink.runtime.metrics.MetricRegistry.<init>(MetricRegistry.java:120)
	at org.apache.flink.runtime.taskmanager.TaskManager$.createTaskManagerComponents(TaskManager.scala:2114)
	at org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1873)
	at org.apache.flink.runtime.taskmanager.TaskManager$.runTaskManager(TaskManager.scala:1769)
	at org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1637)
	at org.apache.flink.runtime.taskmanager.TaskManager.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala)
	at org.apache.flink.yarn.YarnTaskManagerRunner$1.call(YarnTaskManagerRunner.java:146)
	at org.apache.flink.yarn.YarnTaskManagerRunner$1.call(YarnTaskManagerRunner.java:142)
	at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
	at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:142)
	at org.apache.flink.yarn.YarnTaskManager$.main(YarnTaskManager.scala:64)
	at org.apache.flink.yarn.YarnTaskManager.main(YarnTaskManager.scala)

On Fri, Aug 4, 2017 at 3:51 PM, Ajay Tripathy <[hidden email]> wrote:
Hi, I'm running flink jobmanagers/taskmanagers with yarn. I've turned on the JMX reporter in my flink-conf.yaml as follows:

metrics.reporters: jmx

metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter


I was wondering:

Is there a JMX server with the aggregated stats across all jobs / tasks? If so, where is it located? It appears that a JMX starts for every single taskmanager and the jobmanagers do not have the data reported from the taskmanagers.


I'm not sure if this is related, but when I try to specify a port for the jmx reporter, like this:

metrics.reporter.jmx.port: 8789

I'm receiving an error where JMX servers from different task managers fight for that port, and fail to start.