Proper way to establish bucket counts

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Proper way to establish bucket counts

Robert Rapplean
I want a count of events that are put into a bucketing sink, but can't find a ready-made way of doing that. Is there an easier way than to implement a counter for each bucket via the metrics? If metrics counters is the easy way, what do I do to make sure that I don't have a memory leak from expired counters?

Thanks,

Robert
Reply | Threaded
Open this post in threaded view
|

Re: Proper way to establish bucket counts

Fabian Hueske-2
Hi Robert,

Flink collects many metrics by default, including the number of records / events that go into each operator (see [1], System Metrics, IO, "numRecordsIn").
So, you would only need to access that metric.

Best, Fabian

2017-08-01 18:56 GMT+02:00 Robert Rapplean <[hidden email]>:
I want a count of events that are put into a bucketing sink, but can't find a ready-made way of doing that. Is there an easier way than to implement a counter for each bucket via the metrics? If metrics counters is the easy way, what do I do to make sure that I don't have a memory leak from expired counters?

Thanks,

Robert

Reply | Threaded
Open this post in threaded view
|

Re: Proper way to establish bucket counts

Fabian Hueske-2
Hi Robert,
That's right.

The count's are on a per operator-level. I think you can get down to the task-level but counts per bucket are not tracked.

Maybe Chesnay (in CC) can help here. He knows the metrics system the best. @Chesnay, is there a way to expire metric counters?
Alternatively, you could think about forking of a stateful map (or ProcessFunction) in front of the sink that evaluates the bucketing logic and keeps track of the counts. Then you'd have the bucket counts as another stream.


2017-08-03 2:49 GMT+02:00 Robert Rapplean <[hidden email]>:
That looks like something that could be used to count the total events in the overall sink, but doesn't look like it works on a per-bucket basis. We're going to try adding a counter to the bucket status object.

On Wed, Aug 2, 2017 at 3:02 AM, Fabian Hueske <[hidden email]> wrote:
Hi Robert,

Flink collects many metrics by default, including the number of records / events that go into each operator (see [1], System Metrics, IO, "numRecordsIn").
So, you would only need to access that metric.

Best, Fabian

2017-08-01 18:56 GMT+02:00 Robert Rapplean <[hidden email]>:
I want a count of events that are put into a bucketing sink, but can't find a ready-made way of doing that. Is there an easier way than to implement a counter for each bucket via the metrics? If metrics counters is the easy way, what do I do to make sure that I don't have a memory leak from expired counters?

Thanks,

Robert