Monitoring and alerting mechanisms for Flink on YARN

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Monitoring and alerting mechanisms for Flink on YARN

Soumya Simanta
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place. 

 Any existing solutions or suggestions to implement a new one would we appreciated. 

Thanks! 
Reply | Threaded
Open this post in threaded view
|

Re: Monitoring and alerting mechanisms for Flink on YARN

mnxfst
Hi Soumya,

we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source code.

Best
  Christian Kreutzfeldt

2016-04-13 4:29 GMT+02:00 Soumya Simanta <[hidden email]>:
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place. 

 Any existing solutions or suggestions to implement a new one would we appreciated. 

Thanks! 

Reply | Threaded
Open this post in threaded view
|

Re: Monitoring and alerting mechanisms for Flink on YARN

Flavio Pompermaier
Very interesting! Could you please provide more details about its usage in your deployment?

Thanks,
Flavio

On Thu, Apr 14, 2016 at 11:25 PM, Christian Kreutzfeldt <[hidden email]> wrote:
Hi Soumya,

we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source code.

Best
  Christian Kreutzfeldt


2016-04-13 4:29 GMT+02:00 Soumya Simanta <[hidden email]>:
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place. 

 Any existing solutions or suggestions to implement a new one would we appreciated. 

Thanks! 


Reply | Threaded
Open this post in threaded view
|

Re: Monitoring and alerting mechanisms for Flink on YARN

Stephan Ewen
There is also quite an ongoing effort to create and expose more Metrics via JMX.

Parts of that is in the JIRA below, but there will be an additional proposal and design pubshished in the next days.

On Fri, Apr 15, 2016 at 11:04 AM, Flavio Pompermaier <[hidden email]> wrote:
Very interesting! Could you please provide more details about its usage in your deployment?

Thanks,
Flavio


On Thu, Apr 14, 2016 at 11:25 PM, Christian Kreutzfeldt <[hidden email]> wrote:
Hi Soumya,

we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source code.

Best
  Christian Kreutzfeldt


2016-04-13 4:29 GMT+02:00 Soumya Simanta <[hidden email]>:
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place. 

 Any existing solutions or suggestions to implement a new one would we appreciated. 

Thanks!