Task Manager metrics per job on Flink 0.9.1

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Task Manager metrics per job on Flink 0.9.1

Pieter Hameete
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter


Reply | Threaded
Open this post in threaded view
|

Re: Task Manager metrics per job on Flink 0.9.1

Ritesh Kumar Singh
Going by the list in the latest documentation for Flink 0.10.1 release, memory and cpu stats are not stored. Neither is the time spent on garbage collection stored anywhere.
 
In my opinion, trying to store these metrics will degrade the performance of jobs too. And so its basically a trade off between performance and computation cost. For me, the web ui hangs even for the current set of parameters :( 


On Tue, Jan 26, 2016 at 7:16 PM, Pieter Hameete <[hidden email]> wrote:
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter



Reply | Threaded
Open this post in threaded view
|

Re: Task Manager metrics per job on Flink 0.9.1

Pieter Hameete
Hi Ritesh,

thanks for the response! The metrics are already being gathered though, so I think it would be nice to have a configuration/option to log them somewhere. It doesnt have to be enabled by default, and I dont think it should degrade the performance very much. It looks like the metrics are currently sent with each heartbeat by default already. Your Web UI probably hangs because it has to update all the graphs on every heartbeat, when you have many task managers that will be heavy on your computer :-)

- Pieter

2016-01-26 20:17 GMT+01:00 Ritesh Kumar Singh <[hidden email]>:
Going by the list in the latest documentation for Flink 0.10.1 release, memory and cpu stats are not stored. Neither is the time spent on garbage collection stored anywhere.
 
In my opinion, trying to store these metrics will degrade the performance of jobs too. And so its basically a trade off between performance and computation cost. For me, the web ui hangs even for the current set of parameters :( 


On Tue, Jan 26, 2016 at 7:16 PM, Pieter Hameete <[hidden email]> wrote:
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter




Reply | Threaded
Open this post in threaded view
|

Re: Task Manager metrics per job on Flink 0.9.1

Ritesh Kumar Singh
I didn't know these stats were collected. Thanks for telling :)
In that case, it should definitely be a feature which can be enabled via config files.


On Tue, Jan 26, 2016 at 8:22 PM, Pieter Hameete <[hidden email]> wrote:
Hi Ritesh,

thanks for the response! The metrics are already being gathered though, so I think it would be nice to have a configuration/option to log them somewhere. It doesnt have to be enabled by default, and I dont think it should degrade the performance very much. It looks like the metrics are currently sent with each heartbeat by default already. Your Web UI probably hangs because it has to update all the graphs on every heartbeat, when you have many task managers that will be heavy on your computer :-)

- Pieter

2016-01-26 20:17 GMT+01:00 Ritesh Kumar Singh <[hidden email]>:
Going by the list in the latest documentation for Flink 0.10.1 release, memory and cpu stats are not stored. Neither is the time spent on garbage collection stored anywhere.
 
In my opinion, trying to store these metrics will degrade the performance of jobs too. And so its basically a trade off between performance and computation cost. For me, the web ui hangs even for the current set of parameters :( 


On Tue, Jan 26, 2016 at 7:16 PM, Pieter Hameete <[hidden email]> wrote:
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter



<img width="0" height="0" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7">


Reply | Threaded
Open this post in threaded view
|

Re: Task Manager metrics per job on Flink 0.9.1

Fabian Hueske-2
Hi,

it is correct that the metrics are collected from the task managers.
In Flink 0.9.1 the metrics are visualized as charts in the web dashboard.
This visualization was removed when the dashboard was redesigned and updated for 0.10. but will be hopefully be added again.

For Flink 0.9.1, the metrics are cached in memory for plotting and not persisted anywhere. The JobManager exposes the stats via a REST-like interface. You would need to check the code of the web dashboard to get the correct URL.
With the updated web UI of 0.10.1 all job information and stats are exposed via well defined REST interfaces: https://ci.apache.org/projects/flink/flink-docs-release-0.10/internals/monitoring_rest_api.html

In both cases, you can periodically poll the interfaces to collect stats.

Best, Fabian




2016-01-26 21:22 GMT+01:00 Ritesh Kumar Singh <[hidden email]>:
I didn't know these stats were collected. Thanks for telling :)
In that case, it should definitely be a feature which can be enabled via config files.


On Tue, Jan 26, 2016 at 8:22 PM, Pieter Hameete <[hidden email]> wrote:
Hi Ritesh,

thanks for the response! The metrics are already being gathered though, so I think it would be nice to have a configuration/option to log them somewhere. It doesnt have to be enabled by default, and I dont think it should degrade the performance very much. It looks like the metrics are currently sent with each heartbeat by default already. Your Web UI probably hangs because it has to update all the graphs on every heartbeat, when you have many task managers that will be heavy on your computer :-)

- Pieter

2016-01-26 20:17 GMT+01:00 Ritesh Kumar Singh <[hidden email]>:
Going by the list in the latest documentation for Flink 0.10.1 release, memory and cpu stats are not stored. Neither is the time spent on garbage collection stored anywhere.
 
In my opinion, trying to store these metrics will degrade the performance of jobs too. And so its basically a trade off between performance and computation cost. For me, the web ui hangs even for the current set of parameters :( 


On Tue, Jan 26, 2016 at 7:16 PM, Pieter Hameete <[hidden email]> wrote:
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter






Reply | Threaded
Open this post in threaded view
|

Re: Task Manager metrics per job on Flink 0.9.1

Till Rohrmann
In reply to this post by Ritesh Kumar Singh
Hi Pieter,

you're right that it would be nice to record the metrics for a later analysis. However, at the moment this is not supported. You could use the REST interface to obtain the JSON representation of the shown data in the web interface. By doing this repeatedly and parsing the metric data you can store it.

But I agree that this is not very nice. Maybe you could open a JIRA ticket to add this feature.

Cheers,
Till

On Tue, Jan 26, 2016 at 9:22 PM, Ritesh Kumar Singh <[hidden email]> wrote:
I didn't know these stats were collected. Thanks for telling :)
In that case, it should definitely be a feature which can be enabled via config files.


On Tue, Jan 26, 2016 at 8:22 PM, Pieter Hameete <[hidden email]> wrote:
Hi Ritesh,

thanks for the response! The metrics are already being gathered though, so I think it would be nice to have a configuration/option to log them somewhere. It doesnt have to be enabled by default, and I dont think it should degrade the performance very much. It looks like the metrics are currently sent with each heartbeat by default already. Your Web UI probably hangs because it has to update all the graphs on every heartbeat, when you have many task managers that will be heavy on your computer :-)

- Pieter

2016-01-26 20:17 GMT+01:00 Ritesh Kumar Singh <[hidden email]>:
Going by the list in the latest documentation for Flink 0.10.1 release, memory and cpu stats are not stored. Neither is the time spent on garbage collection stored anywhere.
 
In my opinion, trying to store these metrics will degrade the performance of jobs too. And so its basically a trade off between performance and computation cost. For me, the web ui hangs even for the current set of parameters :( 


On Tue, Jan 26, 2016 at 7:16 PM, Pieter Hameete <[hidden email]> wrote:
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter






Reply | Threaded
Open this post in threaded view
|

Re: Task Manager metrics per job on Flink 0.9.1

Pieter Hameete
Hi Fabian and Till,

thanks for the tips i'll see if I can work with the REST interface for now. I'll make a JIRA ticket as well. I might even be able to develop this feature but I wont have time to do that in the coming 2 months. It would be nice to be able to make a first contribution though. Keep up the good work :-)

- Pieter

2016-01-27 11:04 GMT+01:00 Till Rohrmann <[hidden email]>:
Hi Pieter,

you're right that it would be nice to record the metrics for a later analysis. However, at the moment this is not supported. You could use the REST interface to obtain the JSON representation of the shown data in the web interface. By doing this repeatedly and parsing the metric data you can store it.

But I agree that this is not very nice. Maybe you could open a JIRA ticket to add this feature.

Cheers,
Till

On Tue, Jan 26, 2016 at 9:22 PM, Ritesh Kumar Singh <[hidden email]> wrote:
I didn't know these stats were collected. Thanks for telling :)
In that case, it should definitely be a feature which can be enabled via config files.


On Tue, Jan 26, 2016 at 8:22 PM, Pieter Hameete <[hidden email]> wrote:
Hi Ritesh,

thanks for the response! The metrics are already being gathered though, so I think it would be nice to have a configuration/option to log them somewhere. It doesnt have to be enabled by default, and I dont think it should degrade the performance very much. It looks like the metrics are currently sent with each heartbeat by default already. Your Web UI probably hangs because it has to update all the graphs on every heartbeat, when you have many task managers that will be heavy on your computer :-)

- Pieter

2016-01-26 20:17 GMT+01:00 Ritesh Kumar Singh <[hidden email]>:
Going by the list in the latest documentation for Flink 0.10.1 release, memory and cpu stats are not stored. Neither is the time spent on garbage collection stored anywhere.
 
In my opinion, trying to store these metrics will degrade the performance of jobs too. And so its basically a trade off between performance and computation cost. For me, the web ui hangs even for the current set of parameters :( 


On Tue, Jan 26, 2016 at 7:16 PM, Pieter Hameete <[hidden email]> wrote:
Hi people!

A lot of metrics are gathered for each TaskManager every few seconds. The web UI shows nice graphs for some of these metrics too. 

I would like to make graphs of the memory and cpu usage, and the time spent on garbage collection for each job. Because of this I am wondering if the metrics are also stored somewhere, or if there is an option to enable storing the metrics per job.

In the configuration documentation I could not find such an option. Is this possible in version 0.9.1 of Flink? If not: is it possible in Flink 0.10.1 or is it possible to request or develop such a feature?

Thank you for your help and kind regards,

Pieter