get start and end time stamp from time window

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

get start and end time stamp from time window

Martin Neumann
Hej,

I have a windowed stream and I want to run a (generic) fold function on it. The result should have the start and the end time stamp of the window as fields (so I can relate it to the original data). Is there a simple way to get the timestamps from within the fold function?

I could find the lowest and the highest ts as part of the fold function but that would not be very accurate especially when I the number of events in the window is low. Also, I want to write in a generic way so I can use it even if the data itself does not contain a time stamp field (running on processing time).

I have looked into using a WindowFunction where I would have access to the start and end timestamp. I have not quite figured out how I would implement a fold function using this. Also, from my understanding this approach would require holding the whole window in memory which is not a good option since the window data can get very large.

Is there a better way of doing this


cheers Martin
Reply | Threaded
Open this post in threaded view
|

Re: get start and end time stamp from time window

Fabian Hueske-2
Hi Martin,

You can use a FoldFunction and a WindowFunction to process the same! window. The FoldFunction is eagerly applied, so the window state is only one element. When the window is closed, the aggregated element is given to the WindowFunction where you can add start and end time. The iterator of the WindowFunction will provide only one (the aggregated) element.

See the apply method on WindowedStream with the following signature:
apply(initialValue: R, foldFunction: FoldFunction[T, R], function: WindowFunction[R, R, K, W]): DataStream[R]

Cheers, Fabian

2016-05-11 20:16 GMT+02:00 Martin Neumann <[hidden email]>:
Hej,

I have a windowed stream and I want to run a (generic) fold function on it. The result should have the start and the end time stamp of the window as fields (so I can relate it to the original data). Is there a simple way to get the timestamps from within the fold function?

I could find the lowest and the highest ts as part of the fold function but that would not be very accurate especially when I the number of events in the window is low. Also, I want to write in a generic way so I can use it even if the data itself does not contain a time stamp field (running on processing time).

I have looked into using a WindowFunction where I would have access to the start and end timestamp. I have not quite figured out how I would implement a fold function using this. Also, from my understanding this approach would require holding the whole window in memory which is not a good option since the window data can get very large.

Is there a better way of doing this


cheers Martin

Reply | Threaded
Open this post in threaded view
|

Re: get start and end time stamp from time window

Martin Neumann
Thanks for the help. I use a Fold and a WindowFunction in conjunction now and it works fine. Though I wish there would be a less complicated way to do this.

cheers Martin

On Thu, May 12, 2016 at 11:59 AM, Fabian Hueske <[hidden email]> wrote:
Hi Martin,

You can use a FoldFunction and a WindowFunction to process the same! window. The FoldFunction is eagerly applied, so the window state is only one element. When the window is closed, the aggregated element is given to the WindowFunction where you can add start and end time. The iterator of the WindowFunction will provide only one (the aggregated) element.

See the apply method on WindowedStream with the following signature:
apply(initialValue: R, foldFunction: FoldFunction[T, R], function: WindowFunction[R, R, K, W]): DataStream[R]

Cheers, Fabian

2016-05-11 20:16 GMT+02:00 Martin Neumann <[hidden email]>:
Hej,

I have a windowed stream and I want to run a (generic) fold function on it. The result should have the start and the end time stamp of the window as fields (so I can relate it to the original data). Is there a simple way to get the timestamps from within the fold function?

I could find the lowest and the highest ts as part of the fold function but that would not be very accurate especially when I the number of events in the window is low. Also, I want to write in a generic way so I can use it even if the data itself does not contain a time stamp field (running on processing time).

I have looked into using a WindowFunction where I would have access to the start and end timestamp. I have not quite figured out how I would implement a fold function using this. Also, from my understanding this approach would require holding the whole window in memory which is not a good option since the window data can get very large.

Is there a better way of doing this


cheers Martin