(DEPRECATED) Apache Flink User Mailing List archive.

Re: Poor use case? (late arrival + IoT + windowing)

Classic

List

Threaded

2 messages Options

invalidrecipient

Re: Poor use case? (late arrival + IoT + windowing)

First I was told that my application need only perform keyed aggregation of streaming IoT data on a sliding window. Flink seemed the obvious choice.

Then I was told that the window size must be configurable, taking on one of 5 possible values, anywhere from 5-60 minutes. Oh and configuration changes should take effect immediately. No biggie - I just opted to perform aggregation against all 5 possible window durations and let the post-processor worry about which outputs were of interest.

Now the latest requirement (and the interesting part): if the IoT devices lose connectivity, they will buffer many days worth of data until connectivity is restored at which point all of that buffered data will be transmitted to my application. I believe this implies that event time (as determined by each individual device) must now be taken into consideration but...

Question 1: is Flink really the right choice for this application now? Assuming the memory requirements for allowing such late data wouldn't be a deal-breaker, is Flink even capable of tracking event time on a per device/key basis?

Question 2: Assuming a solution with Flink is suitable, what constructs would I need to leverage? Custom windows maybe? Custom triggers and evictors?

Nicolaus Weidner

Re: Poor use case? (late arrival + IoT + windowing)

Hi,

On Sat, May 15, 2021 at 5:07 PM <[hidden email]> wrote:

First I was told that my application need only perform keyed aggregation of streaming IoT data on a sliding window. Flink seemed the obvious choice.

Then I was told that the window size must be configurable, taking on one of 5 possible values, anywhere from 5-60 minutes. Oh and configuration changes should take effect immediately. No biggie - I just opted to perform aggregation against all 5 possible window durations and let the post-processor worry about which outputs were of interest.

Now the latest requirement (and the interesting part): if the IoT devices lose connectivity, they will buffer many days worth of data until connectivity is restored at which point all of that buffered data will be transmitted to my application. I believe this implies that event time (as determined by each individual device) must now be taken into consideration but...

Question 1: is Flink really the right choice for this application now? Assuming the memory requirements for allowing such late data wouldn't be a deal-breaker, is Flink even capable of tracking event time on a per device/key basis?

Tracking and using event time for a windowed aggregation is certainly possible. You can check out [1] for an introduction and [2] for some further information on assigning timestamps to events. Of course, the events need to contain some timestamp of when they were produced in the first place, which I assume to be the case.

Next, lateness: You will need to define a "watermark strategy", i.e. a strategy for deciding when it is safe to close a certain window (see [2]). For example, you could decide that events will probably arrive at most 30 minutes out of order, so 30 minutes after seeing an event with timestamp x, windows with that ending timestamp can be closed. Note that key-based watermarks are not supported currently, they are global.

In addition, you can configure "allowed lateness" for windows [3], meaning that windows will fire again with updated results if events arrive after the "end watermark" of the window has passed and it fired once.

Question 2: Assuming a solution with Flink is suitable, what constructs would I need to leverage? Custom windows maybe? Custom triggers and evictors?

For your use case, you would probably need to allow lateness of one or even several weeks. This is not necessarily a problem, but it will depend on the type of aggregation you perform - whether all events need to be kept in state or just some aggregate values. There are some tips on state size on the bottom of the page in [3]. If you use RocksDB as state backend, state will be kept on disk, so memory limitations shouldn't be an issue.

A custom trigger is not strictly required, but helpful: The default EventTimeTrigger would fire for each element of a late batch, whereas you probably want to fire only after no further event was received for some time span.

An alternative would be to route late events to a side output (see also [3]) and process them separately. This may be preferrable if late batches are more of a rare case, as it won't interfere with the main streaming logic.

Hope this helped at least a bit!

Best wishes,

Nico

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/concepts/time/#event-time-and-watermarks

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/event-time/generating_watermarks/

[3] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/operators/windows/#allowed-lateness

[4] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/state_backends/