Monitoring and alerting mechanisms for Flink on YARN
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place.
Any existing solutions or suggestions to implement a new one would we appreciated.
Re: Monitoring and alerting mechanisms for Flink on YARN
Hi Soumya,
we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source code.
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place.
Any existing solutions or suggestions to implement a new one would we appreciated.
Re: Monitoring and alerting mechanisms for Flink on YARN
Very interesting! Could you please provide more details about its usage in your deployment?
Thanks,
Flavio
On Thu, Apr 14, 2016 at 11:25 PM, Christian Kreutzfeldt <[hidden email]> wrote:
Hi Soumya,
we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source code.
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place.
Any existing solutions or suggestions to implement a new one would we appreciated.
On Fri, Apr 15, 2016 at 11:04 AM, Flavio Pompermaier <[hidden email]> wrote:
Very interesting! Could you please provide more details about its usage in your deployment?
Thanks,
Flavio
On Thu, Apr 14, 2016 at 11:25 PM, Christian Kreutzfeldt <[hidden email]> wrote:
Hi Soumya,
we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source code.
We are about to deploy a Flink job on YARN in production. Given that it is a long running process we want to have alerting and monitoring mechanisms in place.
Any existing solutions or suggestions to implement a new one would we appreciated.