Apache Flink - Are counters reliable and accurate ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink - Are counters reliable and accurate ?

M Singh
Hi:

I need to collect application metrics which are counts (per unit of time eg: minute)  for certain events.  There are two ways of doing this:

1. Create separate streams (using split stream etc) in the application explicitly, then aggregate the counts in a window and save them.  This mixes metrics collection with application logic and making the application logic complex.
2. Use Flink metrics framework (counter, guage, etc) to save metrics

I have a very small test with 2 events but when I run the application the counters are not getting saved (they show value 0) even though that part of the code is being executed.  I do see the numRecordsIn counters being updated from the source operator.  I've also tried incrementing the count by 10 (instead of 1) every time the function gets execute but still the counts remain 0.

Here is snippet of the code:

dataStream.map(new RichMapFunction<String, String>() {

            protected Counter counter;

            public void open(Configuration parameters) {
                counter = getRuntimeContext().getMetricGroup().addGroup("test", "split").counter("success");
            }
            @Override
            public String map(String value) throws Exception {
                counter.inc();
                return value;
            }
        });


As I mentioned, I do get the success metric count but the value is always 0, even though the above map function was executed.  

My questions are:

1. Are there any issues regarding counters being approximate ?
2. If I want to collect accurate counts, is it recommended to use counters or should I do it explicitly (which is making the code too complex) ?
3. Do counters participate in flink's failure/checkpointing/recovery ?
4. Is there any better way of collecting application metric counts ?

Thanks

Mans
Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Are counters reliable and accurate ?

Chesnay Schepler
1) None that I'm aware of.
2) You should use counters.
3) No, counters are not checkpointed, but you could store the value in state yourself.
4) None that I'm aware of that doesn't require modifications to the application logic.

How long does your job run for, and how do you access metrics?

On 27/06/2019 17:36, M Singh wrote:
Hi:

I need to collect application metrics which are counts (per unit of time eg: minute)  for certain events.  There are two ways of doing this:

1. Create separate streams (using split stream etc) in the application explicitly, then aggregate the counts in a window and save them.  This mixes metrics collection with application logic and making the application logic complex.
2. Use Flink metrics framework (counter, guage, etc) to save metrics

I have a very small test with 2 events but when I run the application the counters are not getting saved (they show value 0) even though that part of the code is being executed.  I do see the numRecordsIn counters being updated from the source operator.  I've also tried incrementing the count by 10 (instead of 1) every time the function gets execute but still the counts remain 0.

Here is snippet of the code:

dataStream.map(new RichMapFunction<String, String>() {

            protected Counter counter;

            public void open(Configuration parameters) {
                counter = getRuntimeContext().getMetricGroup().addGroup("test", "split").counter("success");
            }
            @Override
            public String map(String value) throws Exception {
                counter.inc();
                return value;
            }
        });


As I mentioned, I do get the success metric count but the value is always 0, even though the above map function was executed.  

My questions are:

1. Are there any issues regarding counters being approximate ?
2. If I want to collect accurate counts, is it recommended to use counters or should I do it explicitly (which is making the code too complex) ?
3. Do counters participate in flink's failure/checkpointing/recovery ?
4. Is there any better way of collecting application metric counts ?

Thanks

Mans


Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Are counters reliable and accurate ?

M Singh
Hi Chesnay:

Thanks for your response.

My job runs for a few minutes and i've tried setting the reporter interval to 1 second.

I will try the counter on a longer running job.

Thanks again.

On Thursday, June 27, 2019, 11:46:17 AM EDT, Chesnay Schepler <[hidden email]> wrote:


1) None that I'm aware of.
2) You should use counters.
3) No, counters are not checkpointed, but you could store the value in state yourself.
4) None that I'm aware of that doesn't require modifications to the application logic.

How long does your job run for, and how do you access metrics?

On 27/06/2019 17:36, M Singh wrote:
Hi:

I need to collect application metrics which are counts (per unit of time eg: minute)  for certain events.  There are two ways of doing this:

1. Create separate streams (using split stream etc) in the application explicitly, then aggregate the counts in a window and save them.  This mixes metrics collection with application logic and making the application logic complex.
2. Use Flink metrics framework (counter, guage, etc) to save metrics

I have a very small test with 2 events but when I run the application the counters are not getting saved (they show value 0) even though that part of the code is being executed.  I do see the numRecordsIn counters being updated from the source operator.  I've also tried incrementing the count by 10 (instead of 1) every time the function gets execute but still the counts remain 0.

Here is snippet of the code:

dataStream.map(new RichMapFunction<String, String>() {

            protected Counter counter;

            public void open(Configuration parameters) {
                counter = getRuntimeContext().getMetricGroup().addGroup("test", "split").counter("success");
            }
            @Override
            public String map(String value) throws Exception {
                counter.inc();
                return value;
            }
        });


As I mentioned, I do get the success metric count but the value is always 0, even though the above map function was executed.  

My questions are:

1. Are there any issues regarding counters being approximate ?
2. If I want to collect accurate counts, is it recommended to use counters or should I do it explicitly (which is making the code too complex) ?
3. Do counters participate in flink's failure/checkpointing/recovery ?
4. Is there any better way of collecting application metric counts ?

Thanks

Mans


Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Are counters reliable and accurate ?

Chesnay Schepler
So here's the thing: Metrics are accurate, so long as the job is running. Once the job terminates metrics are cleaned up and not persisted anywhere, with the exception of a few metrics (like numRecordsIn).

Another thing that is always good to double-check is to enable DEBUG logging and re-run your test.

On 27/06/2019 22:41, M Singh wrote:
Hi Chesnay:

Thanks for your response.

My job runs for a few minutes and i've tried setting the reporter interval to 1 second.

I will try the counter on a longer running job.

Thanks again.

On Thursday, June 27, 2019, 11:46:17 AM EDT, Chesnay Schepler [hidden email] wrote:


1) None that I'm aware of.
2) You should use counters.
3) No, counters are not checkpointed, but you could store the value in state yourself.
4) None that I'm aware of that doesn't require modifications to the application logic.

How long does your job run for, and how do you access metrics?

On 27/06/2019 17:36, M Singh wrote:
Hi:

I need to collect application metrics which are counts (per unit of time eg: minute)  for certain events.  There are two ways of doing this:

1. Create separate streams (using split stream etc) in the application explicitly, then aggregate the counts in a window and save them.  This mixes metrics collection with application logic and making the application logic complex.
2. Use Flink metrics framework (counter, guage, etc) to save metrics

I have a very small test with 2 events but when I run the application the counters are not getting saved (they show value 0) even though that part of the code is being executed.  I do see the numRecordsIn counters being updated from the source operator.  I've also tried incrementing the count by 10 (instead of 1) every time the function gets execute but still the counts remain 0.

Here is snippet of the code:

dataStream.map(new RichMapFunction<String, String>() {

            protected Counter counter;

            public void open(Configuration parameters) {
                counter = getRuntimeContext().getMetricGroup().addGroup("test", "split").counter("success");
            }
            @Override
            public String map(String value) throws Exception {
                counter.inc();
                return value;
            }
        });


As I mentioned, I do get the success metric count but the value is always 0, even though the above map function was executed.  

My questions are:

1. Are there any issues regarding counters being approximate ?
2. If I want to collect accurate counts, is it recommended to use counters or should I do it explicitly (which is making the code too complex) ?
3. Do counters participate in flink's failure/checkpointing/recovery ?
4. Is there any better way of collecting application metric counts ?

Thanks

Mans