could I chain two timed window?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

could I chain two timed window?

Jinhua Luo
Hi All,

Given one stream source which generates 20k events/sec, and I need to
aggregate the element count using sliding window of 1 hour size.

The problem is, the window may buffer too many elements (which may
cause a lot of block I/O because of checkpointing?), and in fact it
does not necessary to store them for one hour, because the elements
should get folded incrementally. But unlike Tumbling Window, the
sliding window would save elements for next window, right?

So I am considering kind of workaround, should I chain two window like below:

            .timeWindow(Time.minutes(1))
            ...
            .timeWindow(Time.hours(1), Time.minutes(1))

Here the first window generate 1 minute aggregation units and the
second window provides the sliding output.

Any suggestions? Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: could I chain two timed window?

Fabian Hueske-2
Hi,

sliding windows replicate their records for each window.
If you have use an incrementally aggregating function (ReduceFunction, AggregateFunction) with a sliding, the space requirement should not be an issue because each window stores a single value.
However, this also means that each window performs its aggregations independently from the others. So, if you many concurrent sliding windows, pre-aggregate the records in a tumbling window can reduce the computational effort.

Best, Fabian



2017-12-12 8:10 GMT+01:00 Jinhua Luo <[hidden email]>:
Hi All,

Given one stream source which generates 20k events/sec, and I need to
aggregate the element count using sliding window of 1 hour size.

The problem is, the window may buffer too many elements (which may
cause a lot of block I/O because of checkpointing?), and in fact it
does not necessary to store them for one hour, because the elements
should get folded incrementally. But unlike Tumbling Window, the
sliding window would save elements for next window, right?

So I am considering kind of workaround, should I chain two window like below:

            .timeWindow(Time.minutes(1))
            ...
            .timeWindow(Time.hours(1), Time.minutes(1))

Here the first window generate 1 minute aggregation units and the
second window provides the sliding output.

Any suggestions? Thanks.