Re: windowAll and AggregateFunction

Posted by Ken Krugler on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/windowAll-and-AggregateFunction-tp25414p25425.html

Hi there,

You should be able to use a regular time-based window(), and emit the HyperLogLog binary data as your result, which then would get merged in your custom function (which you set a parallelism of 1 on).

Note that if you are generating unique counts per non-overlapping time window, you’ll need to keep N HLL structures in each operator.

— Ken


On Jan 9, 2019, at 10:26 AM, CPC <[hidden email]> wrote:

Hi Stefan,

Could i use "Reinterpreting a pre-partitioned data stream as keyed stream" feature for this?

On Wed, 9 Jan 2019 at 17:50, Stefan Richter <[hidden email]> wrote:
Hi,

I think your expectation about windowAll is wrong, from the method documentation: “Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance” and I also cannot think of a way in which the windowing API would support your use case without a shuffle. You could probably build the functionality by hand through, but I guess this is not quite what you want.

Best,
Stefan

> On 9. Jan 2019, at 13:43, CPC <[hidden email]> wrote:
>
> Hi all,
>
> In our implementation,we are consuming from kafka and calculating distinct with hyperloglog. We are using windowAll function with a custom AggregateFunction but flink runtime shows a little bit unexpected behavior at runtime. Our sources running with parallelism 4 and i expect add function to run after source calculate partial results and at the end of the window i expect it to send 4 hll object to single operator to merge there(merge function). Instead, it sends all data to single instance and call add function there.
>
> Is here any way to make flink behave like this? I mean calculate partial results after consuming from kafka with paralelism of sources without shuffling(so some part of the calculation can be calculated in parallel) and merge those partial results with a merge function?
>
> Thank you in advance...


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra