Hello there,
I am using prometheus to push metrics to prometheus and then use grafana for visualization. There are metrics like - flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time etc which do not gives job_name. It is tied to an instance. When running multiple jobs in the same yarn cluster it is possible that different jobs have yarn containers on the same instance, in this case it is very difficult to find out which instance has high CPU load, Memory usage etc. Is there a way to tag job_name to these metrics so that the metrics could be visualized per job. Thanks, Hemant |
When you mean "job_name", are you referring to the Prometheus concept of
jobs, of the one of Flink? Which of Flink prometheus reporters are you using? On 2/17/2021 7:37 PM, bat man wrote: > Hello there, > > I am using prometheus to push metrics to prometheus and then use > grafana for visualization. There are metrics like > - flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Load, flink_taskmanager_Status_JVM_CPU_Time > etc which do not gives job_name. It is tied to an instance. > When running multiple jobs in the same yarn cluster it is possible > that different jobs have yarn containers on the same instance, in this > case it is very difficult to find out which instance has high CPU > load, Memory usage etc. > > Is there a way to tag job_name to these metrics so that the metrics > could be visualized per job. > > Thanks, > Hemant |
I meant the Flink jobname. I’m using the below reporter -
Is there any way to tag job names to the task and job manager metrics. Thanks, Hemant On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler <[hidden email]> wrote: When you mean "job_name", are you referring to the Prometheus concept of |
No, Job-/TaskManager metrics cannot be
tagged with the job name.
The reason is that this only makes
sense for application clusters (opposed to session clusters), but
we don't differentiate between the two when it comes to metrics.
On 2/19/2021 3:59 AM, bat man wrote:
|
Is there a way I can look into say for a specific job what’s the cpu usage or memory usage of the yarn containers when multiple jobs are running on the same cluster. Also, the issue am trying to resolve is I’m seeing high memory usage for one of the containers I want isolate the issue with one job and then investigate further. Thanks, Hemant On Fri, 19 Feb 2021 at 12:18 PM, Chesnay Schepler <[hidden email]> wrote:
|
hmm...in a roundabout way this could be
possible I suppose.
For a given job, search through your
metrics for some job metric (like numRestarts on the JM, or any
task metric for TMs), and from that you should be able to infer
the JM/TM that belongs to that (based on the TM ID / host
information in the metric).
Conversely, when you see high cpu usage
in one of the metrics for a JM/TM, search for a job metric for
that same process.
On 2/19/2021 9:14 AM, bat man wrote:
|
Free forum by Nabble | Edit this page |