Hi, In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)). |
Hi Saiph, you can do it the following way:
Cheers, On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
|
Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window. On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:
|
Then go for: On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[hidden email]> wrote:
|
That code will not run in parallel right? So, a map-reduce task would yield better performance no? On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[hidden email]> wrote:
|
True, at this point it does not pre-aggregate in parallel, that is actually a feature on the list but not yet added... On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <[hidden email]> wrote:
|
Hey, I am wondering if the following code will result in identical but more efficient (parallel): input.keyBy(assignRandomKey).window(Time.seconds(10)).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count) Effectively just assigning random keys to do the preaggregation and then do a window on the pre-aggregated values. I wonder if this actually leads to correct results or how does it interplay with the time semantics. Cheers, Gyula Stephan Ewen <[hidden email]> ezt írta (időpont: 2016. febr. 26., P, 19:10):
|
Yes, Gyula, that should work. I would make the random key across a range of 10 * parallelism. On Fri, Feb 26, 2016 at 7:16 PM, Gyula Fóra <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |