(DEPRECATED) Apache Flink User Mailing List archive.

Counting tuples within a window in Flink Stream

Classic

List

Threaded

8 messages Options

Saiph Kappa

Counting tuples within a window in Flink Stream

Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Till Rohrmann

Re: Counting tuples within a window in Flink Stream

Hi Saiph,

you can do it the following way:

input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});

Cheers,
Till

On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:

Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Saiph Kappa

Re: Counting tuples within a window in Flink Stream

Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window.

On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:

Hi Saiph,

you can do it the following way:
input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});
Cheers,
Till
On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Stephan Ewen

Re: Counting tuples within a window in Flink Stream

Then go for:

input.timeWindowAll(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception { return integer + 1; } });

Try to explore the API a bit, most things should be quite intuitive.

There are also some docs: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams

On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[hidden email]> wrote:

Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window.
On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:
Hi Saiph,

you can do it the following way:
input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});
Cheers,
Till
On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Saiph Kappa

Re: Counting tuples within a window in Flink Stream

That code will not run in parallel right? So, a map-reduce task would yield better performance no?

On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[hidden email]> wrote:

Then go for:

input.timeWindowAll(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception { return integer + 1; } });

Try to explore the API a bit, most things should be quite intuitive.
There are also some docs: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[hidden email]> wrote:
Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window.
On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:
Hi Saiph,

you can do it the following way:
input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});
Cheers,
Till
On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Stephan Ewen

Re: Counting tuples within a window in Flink Stream

True, at this point it does not pre-aggregate in parallel, that is actually a feature on the list but not yet added...

On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <[hidden email]> wrote:

That code will not run in parallel right? So, a map-reduce task would yield better performance no?
On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[hidden email]> wrote:
Then go for:

input.timeWindowAll(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception { return integer + 1; } });

Try to explore the API a bit, most things should be quite intuitive.
There are also some docs: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[hidden email]> wrote:
Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window.
On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:
Hi Saiph,

you can do it the following way:
input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});
Cheers,
Till
On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Gyula Fóra

Re: Counting tuples within a window in Flink Stream

Hey,

I am wondering if the following code will result in identical but more efficient (parallel):

input.keyBy(assignRandomKey).window(Time.seconds(10)).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count)

Effectively just assigning random keys to do the preaggregation and then do a window on the pre-aggregated values. I wonder if this actually leads to correct results or how does it interplay with the time semantics.

Cheers,

Gyula

Stephan Ewen <[hidden email]> ezt írta (időpont: 2016. febr. 26., P, 19:10):

True, at this point it does not pre-aggregate in parallel, that is actually a feature on the list but not yet added...
On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <[hidden email]> wrote:
That code will not run in parallel right? So, a map-reduce task would yield better performance no?
On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[hidden email]> wrote:
Then go for:

input.timeWindowAll(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception { return integer + 1; } });

Try to explore the API a bit, most things should be quite intuitive.
There are also some docs: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[hidden email]> wrote:
Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window.
On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:
Hi Saiph,

you can do it the following way:
input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});
Cheers,
Till
On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.

Stephan Ewen

Re: Counting tuples within a window in Flink Stream

Yes, Gyula, that should work. I would make the random key across a range of 10 * parallelism.

On Fri, Feb 26, 2016 at 7:16 PM, Gyula Fóra <[hidden email]> wrote:

Hey,

I am wondering if the following code will result in identical but more efficient (parallel):

input.keyBy(assignRandomKey).window(Time.seconds(10)).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count)

Effectively just assigning random keys to do the preaggregation and then do a window on the pre-aggregated values. I wonder if this actually leads to correct results or how does it interplay with the time semantics.

Cheers,
Gyula
Stephan Ewen <[hidden email]> ezt írta (időpont: 2016. febr. 26., P, 19:10):
True, at this point it does not pre-aggregate in parallel, that is actually a feature on the list but not yet added...
On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <[hidden email]> wrote:
That code will not run in parallel right? So, a map-reduce task would yield better performance no?
On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <[hidden email]> wrote:
Then go for:

input.timeWindowAll(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception { return integer + 1; } });

Try to explore the API a bit, most things should be quite intuitive.
There are also some docs: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <[hidden email]> wrote:
Why the ".keyBy"? I don't want to count tuples by Key. I simply want to count all tuples that are contained in a window.
On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <[hidden email]> wrote:
Hi Saiph,

you can do it the following way:
input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer, Integer>, Integer>() {
    @Override
    public Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception {
        return integer + 1;
    }
});
Cheers,
Till
On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <[hidden email]> wrote:
Hi,

In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)).

Thanks.