Any good ideas for online/offline detection of devices that send events?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Any good ideas for online/offline detection of devices that send events?

Bruno Aranda
Hi all,

We are trying to write an online/offline detector for devices that keep streaming data through Flink. We know how often roughly to expect events from those devices and we want to be able to detect when any of them stops (goes offline) or starts again (comes back online) sending events through the pipeline. For instance, if 5 minutes have passed since the last event of a device, we would fire an event to indicate that the device is offline.

The data from the devices comes through Kafka, with their own event time. The devices events are in order in the partitions and each devices goes to a specific partition, so in theory, we should not have out of order when looking at one partition.

We are assuming a good way to do this is by using sliding windows that are big enough, so we can see the relevant gap before/after the events for each specific device. 

We were wondering if there are other ideas on how to solve this.

Many thanks!

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: Any good ideas for online/offline detection of devices that send events?

Tzu-Li (Gordon) Tai
Hi Bruno!

The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind.

Basically, the pattern you are detecting is a timeout of 5 minutes after the last event. Once that pattern is detected, you emit a “device offline” event downstream.
With this, you can also extend the pattern output stream to detect whether a device has became online again.

Here are some materials for you to take a look at Flink CEP:

The CEP parts in the slides in 2. also provides some good examples of timeout detection using CEP.

Hope this helps!

Cheers,
Gordon

On March 4, 2017 at 1:27:51 AM, Bruno Aranda ([hidden email]) wrote:

Hi all,

We are trying to write an online/offline detector for devices that keep streaming data through Flink. We know how often roughly to expect events from those devices and we want to be able to detect when any of them stops (goes offline) or starts again (comes back online) sending events through the pipeline. For instance, if 5 minutes have passed since the last event of a device, we would fire an event to indicate that the device is offline.

The data from the devices comes through Kafka, with their own event time. The devices events are in order in the partitions and each devices goes to a specific partition, so in theory, we should not have out of order when looking at one partition.

We are assuming a good way to do this is by using sliding windows that are big enough, so we can see the relevant gap before/after the events for each specific device. 

We were wondering if there are other ideas on how to solve this.

Many thanks!

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: Any good ideas for online/offline detection of devices that send events?

Tzu-Li (Gordon) Tai
Some more input:

Right now, you can also use the `ProcessFunction` [1] available in Flink 1.2 to simulate state TTL.
The `ProcessFunction` should allow you to keep device state and simulate the online / offline detection by registering processing timers. In the `onTimer` callback, you can emit the “offline” marker event downstream, and in the `processElement` method, you can emit the “online” marker event if the case is the device has sent an event after it was determined to be offline.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html

On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:

Hi Bruno!

The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind.

Basically, the pattern you are detecting is a timeout of 5 minutes after the last event. Once that pattern is detected, you emit a “device offline” event downstream.
With this, you can also extend the pattern output stream to detect whether a device has became online again.

Here are some materials for you to take a look at Flink CEP:

The CEP parts in the slides in 2. also provides some good examples of timeout detection using CEP.

Hope this helps!

Cheers,
Gordon

On March 4, 2017 at 1:27:51 AM, Bruno Aranda ([hidden email]) wrote:

Hi all,

We are trying to write an online/offline detector for devices that keep streaming data through Flink. We know how often roughly to expect events from those devices and we want to be able to detect when any of them stops (goes offline) or starts again (comes back online) sending events through the pipeline. For instance, if 5 minutes have passed since the last event of a device, we would fire an event to indicate that the device is offline.

The data from the devices comes through Kafka, with their own event time. The devices events are in order in the partitions and each devices goes to a specific partition, so in theory, we should not have out of order when looking at one partition.

We are assuming a good way to do this is by using sliding windows that are big enough, so we can see the relevant gap before/after the events for each specific device. 

We were wondering if there are other ideas on how to solve this.

Many thanks!

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: Any good ideas for online/offline detection of devices that send events?

elmosca
Hi Gordon,

Many thanks for your helpful ideas. We tried yesterday the CEP approach, but could not figure it out. The ProcessFunction one looks more promising, and we are investigating it, though we are fighting with some issues related to the event time, where we cannot see so far the timer triggered at the right event time. We are using ascending timestamps, but at the moment we see the timers fired when it is too late. Investigating more.

Thanks,

Bruno

On Tue, 7 Mar 2017 at 07:49 Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Some more input:

Right now, you can also use the `ProcessFunction` [1] available in Flink 1.2 to simulate state TTL.
The `ProcessFunction` should allow you to keep device state and simulate the online / offline detection by registering processing timers. In the `onTimer` callback, you can emit the “offline” marker event downstream, and in the `processElement` method, you can emit the “online” marker event if the case is the device has sent an event after it was determined to be offline.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html


On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:

Hi Bruno!

The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind.

Basically, the pattern you are detecting is a timeout of 5 minutes after the last event. Once that pattern is detected, you emit a “device offline” event downstream.
With this, you can also extend the pattern output stream to detect whether a device has became online again.

Here are some materials for you to take a look at Flink CEP:

The CEP parts in the slides in 2. also provides some good examples of timeout detection using CEP.

Hope this helps!

Cheers,
Gordon

On March 4, 2017 at 1:27:51 AM, Bruno Aranda ([hidden email]) wrote:

Hi all,

We are trying to write an online/offline detector for devices that keep streaming data through Flink. We know how often roughly to expect events from those devices and we want to be able to detect when any of them stops (goes offline) or starts again (comes back online) sending events through the pipeline. For instance, if 5 minutes have passed since the last event of a device, we would fire an event to indicate that the device is offline.

The data from the devices comes through Kafka, with their own event time. The devices events are in order in the partitions and each devices goes to a specific partition, so in theory, we should not have out of order when looking at one partition.

We are assuming a good way to do this is by using sliding windows that are big enough, so we can see the relevant gap before/after the events for each specific device. 

We were wondering if there are other ideas on how to solve this.

Many thanks!

Bruno
Reply | Threaded
Open this post in threaded view
|

Re: Any good ideas for online/offline detection of devices that send events?

rmetzger0
pro tip for debugging watermarks: They are exposed via a metric in Flink 1.2. 

On Tue, Mar 7, 2017 at 1:37 PM, Bruno Aranda <[hidden email]> wrote:
Hi Gordon,

Many thanks for your helpful ideas. We tried yesterday the CEP approach, but could not figure it out. The ProcessFunction one looks more promising, and we are investigating it, though we are fighting with some issues related to the event time, where we cannot see so far the timer triggered at the right event time. We are using ascending timestamps, but at the moment we see the timers fired when it is too late. Investigating more.

Thanks,

Bruno

On Tue, 7 Mar 2017 at 07:49 Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Some more input:

Right now, you can also use the `ProcessFunction` [1] available in Flink 1.2 to simulate state TTL.
The `ProcessFunction` should allow you to keep device state and simulate the online / offline detection by registering processing timers. In the `onTimer` callback, you can emit the “offline” marker event downstream, and in the `processElement` method, you can emit the “online” marker event if the case is the device has sent an event after it was determined to be offline.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html


On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:

Hi Bruno!

The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind.

Basically, the pattern you are detecting is a timeout of 5 minutes after the last event. Once that pattern is detected, you emit a “device offline” event downstream.
With this, you can also extend the pattern output stream to detect whether a device has became online again.

Here are some materials for you to take a look at Flink CEP:

The CEP parts in the slides in 2. also provides some good examples of timeout detection using CEP.

Hope this helps!

Cheers,
Gordon

On March 4, 2017 at 1:27:51 AM, Bruno Aranda ([hidden email]) wrote:

Hi all,

We are trying to write an online/offline detector for devices that keep streaming data through Flink. We know how often roughly to expect events from those devices and we want to be able to detect when any of them stops (goes offline) or starts again (comes back online) sending events through the pipeline. For instance, if 5 minutes have passed since the last event of a device, we would fire an event to indicate that the device is offline.

The data from the devices comes through Kafka, with their own event time. The devices events are in order in the partitions and each devices goes to a specific partition, so in theory, we should not have out of order when looking at one partition.

We are assuming a good way to do this is by using sliding windows that are big enough, so we can see the relevant gap before/after the events for each specific device. 

We were wondering if there are other ideas on how to solve this.

Many thanks!

Bruno