streaming window operations

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

streaming window operations

Emmanuel
Hello,

Looking at the window operators, I see things like sum, min, max but they're always for a single 'field'.
Is there an easy way to do stats like min, max, average on a window but on many different fields at once?
Should I split the stream into many parallel streams with single fields to achieve that? 
it sounds like it would be more efficient to parse the many fields and do the stats in parallel within the same stream, I guess then with a customer window operator

Your thoughts on this?
Thanks
Reply | Threaded
Open this post in threaded view
|

Re: streaming window operations

Gyula Fóra-2
Hello,

There has been some effort some time ago to implement the functionality what you want (not just for windows) to apply multiple aggregations at once, and at some point it will be in there (unfortunately its not high on the priority list at the moment).

There are different ways of achieving this:

1. Just take your windowed data stream and apply all your transformations on it. If you are using some standard policy like count or time, or any tumbling eviction policy, this should in fact be very efficient. In case of these policies data will not be replicated over the network as we reuse the discretizers and we also do local prereduces.

2. The most efficient way of doing this would be of course to write a simple reduce function that does the intended behaviour. For the basic aggregation types, this is a trivial task.

3. You could of course project the datastream to different fields and apply windowing and transformations on them, but this has a large runtime overhead of having to replicate window discretization operators. I would only do this if you have some user defined trigger and eviction policy.

I hope this helped.

Cheers,
Gyula 

On Fri, Mar 27, 2015 at 9:36 PM, Emmanuel <[hidden email]> wrote:
Hello,

Looking at the window operators, I see things like sum, min, max but they're always for a single 'field'.
Is there an easy way to do stats like min, max, average on a window but on many different fields at once?
Should I split the stream into many parallel streams with single fields to achieve that? 
it sounds like it would be more efficient to parse the many fields and do the stats in parallel within the same stream, I guess then with a customer window operator

Your thoughts on this?
Thanks