(DEPRECATED) Apache Flink User Mailing List archive.

Re: Using Prometheus Client Metrics in Flink

Posted by Meissner, Dylan on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Using-Prometheus-Client-Metrics-in-Flink-tp41768p41793.html

Hi Rion,

Regarding the question about adding Prometheus labels out of the box. This is common ask of all exporters, but Prometheus philosophy sees this as an "anti-pattern" as the metrics source can often be ambivalent about context. See [0] for example of such a discussion.

Instead, we can establish context during service discovery. If, for example, we run clusters for tenants on Kubernetes, then within the kubernetes_sd_config [1] labelling rules we can instruct Prometheus to add the Kubernetes labels from the pods, such as "tenant-id: foo" and "environment: staging" to each incoming metric it processes.

This isn't limited to Kubernetes; each of the service discovery configs designed to accomodate translating metadata from context into metric labels.

If this doesn't work for you, then consider encoding tenant identifier into job names, and extract this identifier in a metric_relabel_config [2]

[0]: https://github.com/prometheus/node_exporter/issues/319

[1]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

[2]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs

From: Rion Williams <[hidden email]>
Sent: Sunday, February 28, 2021 12:46 AM
To: Prasanna kumar <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Using Prometheus Client Metrics in Flink

Hi Prassana,

Thanks for that. It’s what I was doing previously as a workaround however I was just curious if there was any Flink-specific functionality to handle this prior to Prometheus.

Additionally from the docs on metrics [0], it seems that there’s a pattern in place to use supported third-party metrics such as those from CodeHale/DropWizard via a Maven package (flink-metrics-dropwizard). I do see a similarly named package for Prometheus which may be what I’m looking for as it’s similarly named (flink-metrics-prometheus), so I may give that a try.

Thanks,

Rion

[0]: https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html

On Feb 28, 2021, at 12:20 AM, Prasanna kumar <[hidden email]> wrote:

Rion,

Regarding the second question , you can aggregate by using sum function sum(metric_name{jobb_name="JOBNAME"}) . This works is you are using the metric counter.

Prasanna.

On Sat, Feb 27, 2021 at 9:01 PM Rion Williams <[hidden email]> wrote:

Hi folks,

I’ve just recently started working with Flink and I was in the process of adding some metrics through my existing pipeline with the hopes of building some Grafana dashboards with them to help with observability.

Initially I looked at the built-in Flink metrics that were available, but I didn’t see an easy mechanism for setting/using labels with them. Essentially, I have two properties for my messages coming through the pipeline that I’d like to be able to keep track of (tenant/source) across several metrics (e.g. total_messages with tenant / source labels, etc.). I didn’t see an easy way to adjust this out of the box, or wasn’t aware of a good pattern for handling these.

I had previously used the Prometheus Client metrics [0] to accomplish this in the past but I wasn’t entirely sure how it would/could mesh with Flink. Does anyone have experience in working with these or know if they are supported?

Secondly, when using the Flink metrics, I noticed I was receiving a separate metric for each task that was being spun up. Is there an “easy button” to handle aggregating these to ensure that a single metric (e.g. total_messages) reflects the total processed across all of the tasks instead of each individual one?

Any recommendations / resources / advice would be greatly appreciated!

Thanks,

Rion

[0] : https://prometheus.io/docs/instrumenting/clientlibs/