Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

Nguyen, Michael

Hello everbody,

 

Has anyone tried testing AggregateFunction() and ProcessWindowFunction() on a KeyedDataStream? I have reviewed the testing page on Flink’s official website (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html) and I am not quite sure how I could utilize these two functions in an .aggregate() operator for my testing.

 

Here’s how I am using the AggregateFunction (EventCountAggregate()) and ProcessWindowFunction (CalculateWindowTotal()) in my Flink job:

DataStream<Tuple2<Date, Integer>> ec2EventsAggregate =
        ec2Events
                .keyBy(t -> t.
f0)
                .timeWindow(Time.minutes(
30))
                .aggregate(
new EventCountAggregate(), new CalculateWindowTotal())
                .name(
"EC2 creation interval count");

 

 

EventCountAggregate() is counting the each element in ec2Events datastream.

 

CalculateWindowTotal() takes the timestamp of each 30 minute window and correlates it to the number of elements that has been counted so far for the window which returns a Tuple2 containg the end timestamp and the count of elements.

 

 

Thanks,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

vino yang
Hi Michael,

You may need to know `KeyedOneInputStreamOperatorTestHarness` test class.

You can consider `WindowTranslationTest#testAggregateWithWindowFunctionEventTime` or `WindowTranslationTest#testAggregateWithWindowFunctionProcessingTime`[1](both of them call `processElementAndEnsureOutput`) as a example.


Best,
Vino

Nguyen, Michael <[hidden email]> 于2019年10月28日周一 下午3:18写道:

Hello everbody,

 

Has anyone tried testing AggregateFunction() and ProcessWindowFunction() on a KeyedDataStream? I have reviewed the testing page on Flink’s official website (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html) and I am not quite sure how I could utilize these two functions in an .aggregate() operator for my testing.

 

Here’s how I am using the AggregateFunction (EventCountAggregate()) and ProcessWindowFunction (CalculateWindowTotal()) in my Flink job:

DataStream<Tuple2<Date, Integer>> ec2EventsAggregate =
        ec2Events
                .keyBy(t -> t.
f0)
                .timeWindow(Time.minutes(
30))
                .aggregate(
new EventCountAggregate(), new CalculateWindowTotal())
                .name(
"EC2 creation interval count");

 

 

EventCountAggregate() is counting the each element in ec2Events datastream.

 

CalculateWindowTotal() takes the timestamp of each 30 minute window and correlates it to the number of elements that has been counted so far for the window which returns a Tuple2 containg the end timestamp and the count of elements.

 

 

Thanks,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

Nguyen, Michael

Hi Vino,

 

This is a great example – thank you!

 

It looks like I need to instantiate a StreamExecutionEnvironment to order to get my OneInputStreamOperator. Would I need to setup a local flinkCluster using MiniClusterWithClientResource in order to use StreamExecutionEnvironment?

 

 

Best,

Michael

 

 

From: vino yang <[hidden email]>
Date: Monday, October 28, 2019 at 1:32 AM
To: Michael Nguyen <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

 

[External]

 

Hi Michael,

 

You may need to know `KeyedOneInputStreamOperatorTestHarness` test class.

 

You can consider `WindowTranslationTest#testAggregateWithWindowFunctionEventTime` or `WindowTranslationTest#testAggregateWithWindowFunctionProcessingTime`[1](both of them call `processElementAndEnsureOutput`) as a example.

 

 

Best,

Vino

 

Nguyen, Michael <[hidden email]> 20191028日周一 下午3:18写道:

Hello everbody,

 

Has anyone tried testing AggregateFunction() and ProcessWindowFunction() on a KeyedDataStream? I have reviewed the testing page on Flink’s official website (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html) and I am not quite sure how I could utilize these two functions in an .aggregate() operator for my testing.

 

Here’s how I am using the AggregateFunction (EventCountAggregate()) and ProcessWindowFunction (CalculateWindowTotal()) in my Flink job:

DataStream<Tuple2<Date, Integer>> ec2EventsAggregate =
        ec2Events
                .keyBy(t -> t.
f0)
                .timeWindow(Time.minutes(
30))
                .aggregate(
new EventCountAggregate(), new CalculateWindowTotal())
                .name(
"EC2 creation interval count");

 

 

EventCountAggregate() is counting the each element in ec2Events datastream.

 

CalculateWindowTotal() takes the timestamp of each 30 minute window and correlates it to the number of elements that has been counted so far for the window which returns a Tuple2 containg the end timestamp and the count of elements.

 

 

Thanks,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

vino yang
Hi Michael,

From the WindowTranslationTest, I did not see anything about the initialization of mini-cluster. Here we are testing operator, it seems operator test harness has provided the necessary infrastructure.

You can try to see if there is anything missed.

Best,
Vino

Nguyen, Michael <[hidden email]> 于2019年10月28日周一 下午4:51写道:

Hi Vino,

 

This is a great example – thank you!

 

It looks like I need to instantiate a StreamExecutionEnvironment to order to get my OneInputStreamOperator. Would I need to setup a local flinkCluster using MiniClusterWithClientResource in order to use StreamExecutionEnvironment?

 

 

Best,

Michael

 

 

From: vino yang <[hidden email]>
Date: Monday, October 28, 2019 at 1:32 AM
To: Michael Nguyen <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

 

[External]

 

Hi Michael,

 

You may need to know `KeyedOneInputStreamOperatorTestHarness` test class.

 

You can consider `WindowTranslationTest#testAggregateWithWindowFunctionEventTime` or `WindowTranslationTest#testAggregateWithWindowFunctionProcessingTime`[1](both of them call `processElementAndEnsureOutput`) as a example.

 

 

Best,

Vino

 

Nguyen, Michael <[hidden email]> 20191028日周一 下午3:18写道:

Hello everbody,

 

Has anyone tried testing AggregateFunction() and ProcessWindowFunction() on a KeyedDataStream? I have reviewed the testing page on Flink’s official website (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html) and I am not quite sure how I could utilize these two functions in an .aggregate() operator for my testing.

 

Here’s how I am using the AggregateFunction (EventCountAggregate()) and ProcessWindowFunction (CalculateWindowTotal()) in my Flink job:

DataStream<Tuple2<Date, Integer>> ec2EventsAggregate =
        ec2Events
                .keyBy(t -> t.
f0)
                .timeWindow(Time.minutes(
30))
                .aggregate(
new EventCountAggregate(), new CalculateWindowTotal())
                .name(
"EC2 creation interval count");

 

 

EventCountAggregate() is counting the each element in ec2Events datastream.

 

CalculateWindowTotal() takes the timestamp of each 30 minute window and correlates it to the number of elements that has been counted so far for the window which returns a Tuple2 containg the end timestamp and the count of elements.

 

 

Thanks,

Michael