CEP and slightly out of order elements

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

CEP and slightly out of order elements

Sameer Wadkar
Hi,

If using CEP with event-time I have events which can be slightly out of order and I want to sort them by timestamp within their time-windows before applying CEP-

For example, if using 5 second windows and I use the following

ds2 = ds.keyBy.window(TumblingWindow(10 seconds).apply(/*Sort by Timestamp*/);

Next assign watermarks again on ds2 (because elements in ds2 will all have the same timestamp of WINDOW_END_TIME-1ms)
ds2.assignTimestampsAndWatermarks()

Finally apply CEP on ds2 with a WITHIN window of 5 seconds (shorter timestamp than the one I used earlier).

The reasoning is, if I am using the next() operator in CEP, the events should be in the order of their timestamps.

Is this the right way to handle this problem? I have heard people say that assigning watermarks twice can lead to wrong results. But don't I need to assign timestamps once more in this scenario.

Thanks,
Sameer




Reply | Threaded
Open this post in threaded view
|

Re: CEP and slightly out of order elements

Till Rohrmann
Hi Sameer,

the CEP operator will take care of ordering the elements. 

Internally what happens is that the elements are buffered before being applied to the state machine. The operator only applies the elements after it has seen a watermark which is greater than the timestamps of the elements being applied to the NFA. Since the elements are kept in a priority queue wrt the timestamp they will be in order.

Cheers,
Till

On Tue, Oct 11, 2016 at 1:51 PM, Sameer W <[hidden email]> wrote:
Hi,

If using CEP with event-time I have events which can be slightly out of order and I want to sort them by timestamp within their time-windows before applying CEP-

For example, if using 5 second windows and I use the following

ds2 = ds.keyBy.window(TumblingWindow(10 seconds).apply(/*Sort by Timestamp*/);

Next assign watermarks again on ds2 (because elements in ds2 will all have the same timestamp of WINDOW_END_TIME-1ms)
ds2.assignTimestampsAndWatermarks()

Finally apply CEP on ds2 with a WITHIN window of 5 seconds (shorter timestamp than the one I used earlier).

The reasoning is, if I am using the next() operator in CEP, the events should be in the order of their timestamps.

Is this the right way to handle this problem? I have heard people say that assigning watermarks twice can lead to wrong results. But don't I need to assign timestamps once more in this scenario.

Thanks,
Sameer





Reply | Threaded
Open this post in threaded view
|

Re: CEP and slightly out of order elements

Sameer Wadkar
Thanks Till - This is helpful to know.

Sameer

On Tue, Oct 11, 2016 at 12:20 PM, Till Rohrmann <[hidden email]> wrote:
Hi Sameer,

the CEP operator will take care of ordering the elements. 

Internally what happens is that the elements are buffered before being applied to the state machine. The operator only applies the elements after it has seen a watermark which is greater than the timestamps of the elements being applied to the NFA. Since the elements are kept in a priority queue wrt the timestamp they will be in order.

Cheers,
Till

On Tue, Oct 11, 2016 at 1:51 PM, Sameer W <[hidden email]> wrote:
Hi,

If using CEP with event-time I have events which can be slightly out of order and I want to sort them by timestamp within their time-windows before applying CEP-

For example, if using 5 second windows and I use the following

ds2 = ds.keyBy.window(TumblingWindow(10 seconds).apply(/*Sort by Timestamp*/);

Next assign watermarks again on ds2 (because elements in ds2 will all have the same timestamp of WINDOW_END_TIME-1ms)
ds2.assignTimestampsAndWatermarks()

Finally apply CEP on ds2 with a WITHIN window of 5 seconds (shorter timestamp than the one I used earlier).

The reasoning is, if I am using the next() operator in CEP, the events should be in the order of their timestamps.

Is this the right way to handle this problem? I have heard people say that assigning watermarks twice can lead to wrong results. But don't I need to assign timestamps once more in this scenario.

Thanks,
Sameer