Window stream using timestamp key for time

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Window stream using timestamp key for time

Emmanuel
Hello,

I have used Flink to stream data and do analytics on the stream, using time windows...

Now, this is assuming the data is effectively coming in real time. However I have a use case where the data is 'batched' upstream, and comes in bursts, but has a timestamp.
It obviously messes up the windowed stream assumption. 
(note it is a problem with queuing in Kafka for example when there is any kind of downtime downstream of Kafka: if data accumulates and then is consumed, it is consumed at higher 'speed' than real clock time and statistics do not match reality.)

So my question is:

Is it possible to use a window stream based on a timestamp key for time, as opposed to clock time?

How would one do this with the current API?

Thanks
Emmanuel
Reply | Threaded
Open this post in threaded view
|

Re: Window stream using timestamp key for time

Fabian Hueske-2
Hi Emmanuel,

the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:

1) Event-Time concepts: http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink: http://flink.apache.org/news/2015/12/04/Introducing-windows.html
3) Event-Time example use-case: https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana
4) Code for example: https://github.com/dataArtisans/flink-streaming-demo

Best, Fabian


2016-01-28 23:08 GMT+01:00 Emmanuel <[hidden email]>:
Hello,

I have used Flink to stream data and do analytics on the stream, using time windows...

Now, this is assuming the data is effectively coming in real time. However I have a use case where the data is 'batched' upstream, and comes in bursts, but has a timestamp.
It obviously messes up the windowed stream assumption. 
(note it is a problem with queuing in Kafka for example when there is any kind of downtime downstream of Kafka: if data accumulates and then is consumed, it is consumed at higher 'speed' than real clock time and statistics do not match reality.)

So my question is:

Is it possible to use a window stream based on a timestamp key for time, as opposed to clock time?

How would one do this with the current API?

Thanks
Emmanuel

Reply | Threaded
Open this post in threaded view
|

RE: Window stream using timestamp key for time

Emmanuel
Nice,

you guys rock!



From: [hidden email]
Date: Thu, 28 Jan 2016 23:34:58 +0100
Subject: Re: Window stream using timestamp key for time
To: [hidden email]

Hi Emmanuel,

the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:

1) Event-Time concepts: http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink: http://flink.apache.org/news/2015/12/04/Introducing-windows.html
3) Event-Time example use-case: https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana
4) Code for example: https://github.com/dataArtisans/flink-streaming-demo

Best, Fabian


2016-01-28 23:08 GMT+01:00 Emmanuel <[hidden email]>:
Hello,

I have used Flink to stream data and do analytics on the stream, using time windows...

Now, this is assuming the data is effectively coming in real time. However I have a use case where the data is 'batched' upstream, and comes in bursts, but has a timestamp.
It obviously messes up the windowed stream assumption. 
(note it is a problem with queuing in Kafka for example when there is any kind of downtime downstream of Kafka: if data accumulates and then is consumed, it is consumed at higher 'speed' than real clock time and statistics do not match reality.)

So my question is:

Is it possible to use a window stream based on a timestamp key for time, as opposed to clock time?

How would one do this with the current API?

Thanks
Emmanuel