Re: Use of AggregateFunction's merge() method

Posted by Fabian Hueske-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Use-of-AggregateFunction-s-merge-method-tp19978p19986.html

Hi Ken,

You are right. The merge() method combines partial aggregates, similar to a combinable reducer.

The only situation when merge() is called in a DataStream job (that I am aware of) is when session windows get merged.
For example when you define a session window with 30 minute gap and you receive the following records
R1, 12:00:00
R2, 12:05:00
R3, 12:40:00
R4, 12:20:00

In this case, Flink R1 will create a new window W1, R2 will be assigned to W1, R3 creates a new window W2, and R4 connects and merges W1 and W2.

Best, Fabian

2018-05-05 0:46 GMT+02:00 Ken Krugler <[hidden email]>:
I’m trying to figure out when/why the AggregateFunction.merge() method is called in a streaming job, to ensure I’ve implemented it properly.

The documentation for AggregateFunction says "Merging intermediate aggregates (partial aggregates) means merging the accumulators.”

But that sounds more like a combiner in batch processing, not streaming.

From the code, it seems like this could be called if a MergingWindowAssigner is used, right?

And is there any other situation in streaming where merge() could be called?

Thanks,

— Ken

--------------------------------------------
+1 530-210-6378