Early events

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Early events

Juan Rodríguez Hortalá
Hi,

Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a skewed clock, that assign timestamps in the future to the events. I have made a simple test with a time window (https://github.com/juanrh/flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2), and it looks like Flink treats early events like events generated at the current processing time. What it's the expected behaviour of Flink for early events?

Early events might be interesting for generating test data, if Flink was able to buffer those early events until its actual time arrives, although I guess implementing that would probably impact the performance in production. But as I say, early events might happen in production because you can have wrong clocks or wrong code in general in the devices that generate the events. Maybe a fallback to ingestion time would make sense, and an approximation to that might be implemented with a timestamp extractor that overrides future timestamps with System.currenTimeMillis.

Greetings,

Juan
Reply | Threaded
Open this post in threaded view
|

Re: Early events

Juan Rodríguez Hortalá
Hi,

There was a bug in my code, I was assigning the timestamps wrong and that is why it looked like early events where assigned processing time. Surprisingly enought my test works both ok with early events. In fact I have modified my test data generator to generate early events or late events, and both seem to work ok with my test (https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/main/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSource.java, https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java)

Anyway, is this the expected behaviour for early events? Is Flink buffering early events until their future timestamp arrives?

Thanks,

Juan


On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <[hidden email]> wrote:
Hi,

Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a skewed clock, that assign timestamps in the future to the events. I have made a simple test with a time window (https://github.com/juanrh/flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2), and it looks like Flink treats early events like events generated at the current processing time. What it's the expected behaviour of Flink for early events?

Early events might be interesting for generating test data, if Flink was able to buffer those early events until its actual time arrives, although I guess implementing that would probably impact the performance in production. But as I say, early events might happen in production because you can have wrong clocks or wrong code in general in the devices that generate the events. Maybe a fallback to ingestion time would make sense, and an approximation to that might be implemented with a timestamp extractor that overrides future timestamps with System.currenTimeMillis.

Greetings,

Juan

Reply | Threaded
Open this post in threaded view
|

Re: Early events

Aljoscha Krettek
Hi,
yes, Flink is expected to buffer those until the watermark catches up with their timestamp.

Cheers,
Aljoscha

On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortalá <[hidden email]> wrote:
Hi,

There was a bug in my code, I was assigning the timestamps wrong and that is why it looked like early events where assigned processing time. Surprisingly enought my test works both ok with early events. In fact I have modified my test data generator to generate early events or late events, and both seem to work ok with my test (https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/main/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSource.java, https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java)

Anyway, is this the expected behaviour for early events? Is Flink buffering early events until their future timestamp arrives?

Thanks,

Juan


On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <[hidden email]> wrote:
Hi,

Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a skewed clock, that assign timestamps in the future to the events. I have made a simple test with a time window (https://github.com/juanrh/flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2), and it looks like Flink treats early events like events generated at the current processing time. What it's the expected behaviour of Flink for early events?

Early events might be interesting for generating test data, if Flink was able to buffer those early events until its actual time arrives, although I guess implementing that would probably impact the performance in production. But as I say, early events might happen in production because you can have wrong clocks or wrong code in general in the devices that generate the events. Maybe a fallback to ingestion time would make sense, and an approximation to that might be implemented with a timestamp extractor that overrides future timestamps with System.currenTimeMillis.

Greetings,

Juan

Reply | Threaded
Open this post in threaded view
|

Re: Early events

Juan Rodríguez Hortalá
That makes sense, thanks for your answer.

Greetings,

Juan

On Mon, Nov 21, 2016 at 9:11 AM, Aljoscha Krettek <[hidden email]> wrote:
Hi,
yes, Flink is expected to buffer those until the watermark catches up with their timestamp.

Cheers,
Aljoscha

On Sun, 20 Nov 2016 at 06:18 Juan Rodríguez Hortalá <[hidden email]> wrote:
Hi,

There was a bug in my code, I was assigning the timestamps wrong and that is why it looked like early events where assigned processing time. Surprisingly enought my test works both ok with early events. In fact I have modified my test data generator to generate early events or late events, and both seem to work ok with my test (https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/main/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSource.java, https://github.com/juanrh/flink-state-eviction/blob/293fe1cf972b2e4bc6fb4e874eb8ba70c78f7894/src/test/java/com/github/juanrh/streaming/source/EventTimeDelayedElementsSourceTest.java)

Anyway, is this the expected behaviour for early events? Is Flink buffering early events until their future timestamp arrives?

Thanks,

Juan


On Sat, Nov 19, 2016 at 8:31 PM, Juan Rodríguez Hortalá <[hidden email]> wrote:
Hi,

Maybe this is already in the documentation, sorry if I'm asking something obvious. I was thinking that if you have event time then you can also have early events, which would be events whose extracted timestampt is in the future. This might happen in practice for example in sensors with a skewed clock, that assign timestamps in the future to the events. I have made a simple test with a time window (https://github.com/juanrh/flink-state-eviction/commit/09c2c1fe1e6068b0703c0833b8a574313cdca5a2), and it looks like Flink treats early events like events generated at the current processing time. What it's the expected behaviour of Flink for early events?

Early events might be interesting for generating test data, if Flink was able to buffer those early events until its actual time arrives, although I guess implementing that would probably impact the performance in production. But as I say, early events might happen in production because you can have wrong clocks or wrong code in general in the devices that generate the events. Maybe a fallback to ingestion time would make sense, and an approximation to that might be implemented with a timestamp extractor that overrides future timestamps with System.currenTimeMillis.

Greetings,

Juan