Sink metric numRecordsIn drops temporarily

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Sink metric numRecordsIn drops temporarily

Philipp Bussche
Hi there, I witnessed an interesting behaviour on my Grafana dashboard where
sink related metrics would show activity where there should not be any. I am
saying this because for this particular sink activity is triggered by data
being ingested through a cronjob at a particular time, however the dashboard
is saying there is activity also outside this time.
I had a closer look and in my graph I am using the NonNegativeDerivative
function (the data actually sits in Graphite) on the metric. Disabling this
filter shows that for a short period of time the numRecordsIn counter is
dropping and then gets back to the previous value. This drop is then shown
on the graph and is looking like data activity because of the
NonNegativeDerivative function.
Why would the value of a counter temporarily decrease and then go back to
its previous level ?
Please see screenshots attached.

Thanks
Philipp

<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t576/Sink_numRecordsIn_NonNegativeDerivative_.png>
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t576/Sink_numRecordsIn_Value_Drops.png>



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Sink metric numRecordsIn drops temporarily

Michael Fong
Missed to cc to [hidden email]

Hi, 

Just wondering what is the value of that counter (wo/ applying NonNegativeDerivative function) when you observe the spikes? If I remember correctly, Grafana is known to aggregate those values by averaging them across the time duration selected before rendering to the front-end. The charts show value across multiple days, and what values do that metric stand at minute scale?

Regards,

Michael

On Sun, Sep 17, 2017 at 9:17 PM, Philipp Bussche <[hidden email]> wrote:
Hi there, I witnessed an interesting behaviour on my Grafana dashboard where
sink related metrics would show activity where there should not be any. I am
saying this because for this particular sink activity is triggered by data
being ingested through a cronjob at a particular time, however the dashboard
is saying there is activity also outside this time.
I had a closer look and in my graph I am using the NonNegativeDerivative
function (the data actually sits in Graphite) on the metric. Disabling this
filter shows that for a short period of time the numRecordsIn counter is
dropping and then gets back to the previous value. This drop is then shown
on the graph and is looking like data activity because of the
NonNegativeDerivative function.
Why would the value of a counter temporarily decrease and then go back to
its previous level ?
Please see screenshots attached.

Thanks
Philipp

<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t576/Sink_numRecordsIn_NonNegativeDerivative_.png>
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t576/Sink_numRecordsIn_Value_Drops.png>



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reply | Threaded
Open this post in threaded view
|

Re: Sink metric numRecordsIn drops temporarily

Philipp Bussche
Hi, thank you for your answer. So for September 11th which is shown on the
screenshot I had the counter sitting at 26.91k where when the drop happened
it was going down to 26.01k. This happened 3 times during that day and it
was always going back to the same value.
Philipp



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Sink metric numRecordsIn drops temporarily

Michael Fong
Hi, 

Thank you for your description. What I tried to understand is what the counter value is at that moment of spikes. Grafana would take the average out of a continuous data values before rendering result to UI. That is, if the metrics value is not transmitted continuously, where at some data point appears to be zeros, then the average value over time would be lower than the snapshot value. I would suggest to first check what the value is by zooming into the minimum scale in term of data retention policy set in Graphite. (per minute, or second, depending on settings)


I actually do not have concrete answer for that counter in Flink. Perhaps someone knows better on the semantics of this metrics would. However, there is a possibility which we have observed similarly in other Java application. This usually happens to a fast-growing counter, when its next proceeding value exceeds its positive upper bound. Normally, metrics library does not reset its value to 0. If I remember correctly, Long.MAX_VALUE + 1 = Long.MIN_VALUE, take long data type for example. Therefore, taking NonNegativeDerivative( delta ) results in a very high peak in graph. 

Hope this helps. 

On Sun, Sep 17, 2017 at 11:02 PM, Philipp Bussche <[hidden email]> wrote:
Hi, thank you for your answer. So for September 11th which is shown on the
screenshot I had the counter sitting at 26.91k where when the drop happened
it was going down to 26.01k. This happened 3 times during that day and it
was always going back to the same value.
Philipp