Hej,
I have a windowed stream and I want to run a (generic) fold function on it. The result should have the start and the end time stamp of the window as fields (so I can relate it to the original data). Is there a simple way to get the timestamps from within the fold function? I could find the lowest and the highest ts as part of the fold function but that would not be very accurate especially when I the number of events in the window is low. Also, I want to write in a generic way so I can use it even if the data itself does not contain a time stamp field (running on processing time). I have looked into using a WindowFunction where I would have access to the start and end timestamp. I have not quite figured out how I would implement a fold function using this. Also, from my understanding this approach would require holding the whole window in memory which is not a good option since the window data can get very large. Is there a better way of doing this cheers Martin |
Hi Martin, You can use a FoldFunction and a WindowFunction to process the same! window. The FoldFunction is eagerly applied, so the window state is only one element. When the window is closed, the aggregated element is given to the WindowFunction where you can add start and end time. The iterator of the WindowFunction will provide only one (the aggregated) element.apply(initialValue: R, foldFunction: FoldFunction[T, R], function: WindowFunction[R, R, K, W]): DataStream[R] 2016-05-11 20:16 GMT+02:00 Martin Neumann <[hidden email]>:
|
Thanks for the help. I use a Fold and a WindowFunction in conjunction now and it works fine. Though I wish there would be a less complicated way to do this.
cheers Martin On Thu, May 12, 2016 at 11:59 AM, Fabian Hueske <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |