Hi there, I witnessed an interesting behaviour on my Grafana dashboard where
sink related metrics would show activity where there should not be any. I am saying this because for this particular sink activity is triggered by data being ingested through a cronjob at a particular time, however the dashboard is saying there is activity also outside this time. I had a closer look and in my graph I am using the NonNegativeDerivative function (the data actually sits in Graphite) on the metric. Disabling this filter shows that for a short period of time the numRecordsIn counter is dropping and then gets back to the previous value. This drop is then shown on the graph and is looking like data activity because of the NonNegativeDerivative function. Why would the value of a counter temporarily decrease and then go back to its previous level ? Please see screenshots attached. Thanks Philipp <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t576/Sink_numRecordsIn_NonNegativeDerivative_.png> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t576/Sink_numRecordsIn_Value_Drops.png> -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Missed to cc to [hidden email]
|
Hi, thank you for your answer. So for September 11th which is shown on the
screenshot I had the counter sitting at 26.91k where when the drop happened it was going down to 26.01k. This happened 3 times during that day and it was always going back to the same value. Philipp -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi, Thank you for your description. What I tried to understand is what the counter value is at that moment of spikes. Grafana would take the average out of a continuous data values before rendering result to UI. That is, if the metrics value is not transmitted continuously, where at some data point appears to be zeros, then the average value over time would be lower than the snapshot value. I would suggest to first check what the value is by zooming into the minimum scale in term of data retention policy set in Graphite. (per minute, or second, depending on settings) I actually do not have concrete answer for that counter in Flink. Perhaps someone knows better on the semantics of this metrics would. However, there is a possibility which we have observed similarly in other Java application. This usually happens to a fast-growing counter, when its next proceeding value exceeds its positive upper bound. Normally, metrics library does not reset its value to 0. If I remember correctly, Long.MAX_VALUE + 1 = Long.MIN_VALUE, take long data type for example. Therefore, taking NonNegativeDerivative( delta ) results in a very high peak in graph. Hope this helps. On Sun, Sep 17, 2017 at 11:02 PM, Philipp Bussche <[hidden email]> wrote: Hi, thank you for your answer. So for September 11th which is shown on the |
Free forum by Nabble | Edit this page |