(DEPRECATED) Apache Flink User Mailing List archive.

Metric counter gets reset when leader jobmanager changes in Flink native K8s HA solution

Classic

List

Threaded

2 messages Options

Amit Bhatia

Metric counter gets reset when leader jobmanager changes in Flink native K8s HA solution

Hi,

We have configured jobmanager HA with flink 1.12.1 on the k8s environment. We have implemented a HA solution using Native K8s HA solution (https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink). We have used deployment controller for both jobmanager & taskmanager pods.

So whenever a leader jobmanager crashes and the same jobmanager becomes leader again then everything works fine but whenever a leader jobmanager crashes and some other standby jobmanager becomes leader then metric count gets reset and it starts the request count again from 1. Is it the expected behaviour ? or is there any specific configuration required so that even if the leader jobmanager changes then instead of resetting the metric count it continues the count.

Regards,

Amit

Prasanna kumar

Re: Metric counter gets reset when leader jobmanager changes in Flink native K8s HA solution

amit,

This is expected behaviour from counter . If the total count irrespective of the restarts needed to be found, aggregate functions need to be applied on the counter . Example sum(Rate(counter)) https://prometheus.io/docs/prometheus/latest/querying/functions/

Prasanna.

On Tue, Jun 15, 2021 at 8:25 AM Amit Bhatia <[hidden email]> wrote:

Hi,

We have configured jobmanager HA with flink 1.12.1 on the k8s environment. We have implemented a HA solution using Native K8s HA solution (https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink). We have used deployment controller for both jobmanager & taskmanager pods.

So whenever a leader jobmanager crashes and the same jobmanager becomes leader again then everything works fine but whenever a leader jobmanager crashes and some other standby jobmanager becomes leader then metric count gets reset and it starts the request count again from 1. Is it the expected behaviour ? or is there any specific configuration required so that even if the leader jobmanager changes then instead of resetting the metric count it continues the count.

Regards,
Amit