Identifying missing events in keyed streams

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Identifying missing events in keyed streams

Averell
Hi everyone,

I have a keyed stream which is expecting events every fixed interval (let's
say 1 minute). I want to raise alarms for any key which has received no
events in n-periods. What should be the cheapest way (in term of performance
) to do this?
I thought of some solutions, but don't know which one is the best:
1. Sliding window then count the number of events in each window <<< this
seems quite expensive when n is big.
2. Register a timer for every single event, record the last event timestamp
and check that timestamp when the timer expires. (This would be the best if
there's an option to cancel/modify a timer, but it seems that feature is not
available yet)
3. Session window: i haven't implemented this to verify its feasibility.
Thinking of firing the alarm on every window clear event.
4. CEP. I don't know whether it's possible or not. Haven't found a guide for
defining patterns of missing events.

Could you please give some advices?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Identifying missing events in keyed streams

Fabian Hueske-2
Hi Averell,

I'd go with approach 2). As of Flink 1.6.0 you can delete timers. 

But even if you are on a pre-1.6 version, a ProcessFunction would be the way to go, IMO.
You don't need to register a timer for each event.
Instead, you can register the first timer with the first event and have a state that is updated with the timestamp of the last seen event.
When the timer fires, you check the if you need to raise an alert and register a new timer such that it fires 1 minute after the last seen event (last-seen + 1 minute - (now - last-seen)).

Best, Fabian

Am Do., 4. Okt. 2018 um 16:15 Uhr schrieb Averell <[hidden email]>:
Hi everyone,

I have a keyed stream which is expecting events every fixed interval (let's
say 1 minute). I want to raise alarms for any key which has received no
events in n-periods. What should be the cheapest way (in term of performance
) to do this?
I thought of some solutions, but don't know which one is the best:
1. Sliding window then count the number of events in each window <<< this
seems quite expensive when n is big.
2. Register a timer for every single event, record the last event timestamp
and check that timestamp when the timer expires. (This would be the best if
there's an option to cancel/modify a timer, but it seems that feature is not
available yet)
3. Session window: i haven't implemented this to verify its feasibility.
Thinking of firing the alarm on every window clear event.
4. CEP. I don't know whether it's possible or not. Haven't found a guide for
defining patterns of missing events.

Could you please give some advices?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Identifying missing events in keyed streams

Averell
Hi Fabian,

Thanks for the suggestion.
I will try with that support of removing timers.

I have also tried approach (3) - using session windows, and it works: I set
session gap to 2 minutes, and use an aggregation window function to keep the
amount of in-memory data for each keyed stream to the minimum level.

Could you please explain why (2) is better?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Identifying missing events in keyed streams

Fabian Hueske-2
I'd go with 2) because the logic is simple and it is (IMO) much easier to understand what is going on and what state is kept.

Am Do., 11. Okt. 2018 um 12:42 Uhr schrieb Averell <[hidden email]>:
Hi Fabian,

Thanks for the suggestion.
I will try with that support of removing timers.

I have also tried approach (3) - using session windows, and it works: I set
session gap to 2 minutes, and use an aggregation window function to keep the
amount of in-memory data for each keyed stream to the minimum level.

Could you please explain why (2) is better?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Identifying missing events in keyed streams

Averell
Thank you Fabian.

Tried (2), and it's working well.
I found one more benefit of (2) over (3) is that it allow me to easily raise
multiple levels of alarms for each keyed stream (i.e: minor: missed 2
cycles, major: missed 5 cycles,...)

Thanks for your help.

Regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/