The goal is:
* to split data, random-uniformly, across N nodes, * window the data identically on each node, * transform the windows locally on each node, and * merge the N parallel windows into a global window stream, such that one window from each parallel process is merged into a "global window" aggregate I've achieved all but the last bullet point, merging one window from each partition into a globally-aggregated window output stream. To be clear, a rolling reduce won't work because it would aggregate over all previous windows in all partitioned streams, and I only need to aggregate over one window from each partition at a time. Similarly for a fold. The closest I have found is ParallelMerge for ConnectedStreams, but I have not found a way to apply it to this problem. Can flink achieve this? If so, I'd greatly appreciate a point in the right direction. Cheers, -aj |
Maybe this can be done by assigning the same window id to each of the N local windows, and do a .keyBy(windowId)2016-10-06 22:39 GMT+02:00 AJ Heller <[hidden email]>:
|
Thank you Fabian, I think that solves it. I'll need to rig up some tests to verify, but it looks good. I used a RichMapFunction to assign ids incrementally to windows (mapping STREAM_OBJECT to Tuple2<Long, STREAM_OBJECT> using a private long value in the mapper that increments on every map call). It works, but by any chance is there a more succinct way to do it? On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske <[hidden email]> wrote:
|
If you are using time windows, you can access the TimeWindow parameter of the WindowFunction.apply() method. The TimeWindow contains the start and end timestamp of a window (as Long) which can act as keys.2016-10-07 1:06 GMT+02:00 AJ Heller <[hidden email]>:
|
Hi, are you windowing based on event time? Cheers, Aljoscha On Fri, 7 Oct 2016 at 09:28 Fabian Hueske <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |