Re: AllWindowed vs Windowed with 1 key

Posted by Adrienne Kole on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/AllWindowed-vs-Windowed-with-1-key-tp13039p13048.html

Hi,

Thanks for the reply. So I have 2 cases:

1. timeWindowAll (length, slide).reduce (...) (with parallelism = 1)
2. groupby(someField).timeWindow(length, slide). reduce(...)

Lets say case-1 global window, case-2 partitioned window. If I have only one key (for case-2) and I set parallelism=1  for case-1, I would expect that both cases have similar performance both in terms of latency and throughput. However, partitioned windows outperform global ones by orders of magnitude in terms of throughput.
I am using Flink 1.1.3.


Thanks,
Adrienne


 

On Mon, May 8, 2017 at 3:55 PM, Stefan Richter <[hidden email]> wrote:
Hi,

to answer this question, we would first need to know what you mean by „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer might depend on the Flink version that you are using.

Best,
Stefan

> Am 07.05.2017 um 23:23 schrieb Adrienne Kole <[hidden email]>:
>
> Hi,
>
> I am doing simple aggregation with a keyed and global windows in flink.
> When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance.
>
> However, keyed stream with 1 key performs with 2x more throughput than global window.
> My configuration is 8 node cluster, 16 core in each node, parallelism = 128.
>
> AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.
>
> What can be the reason behind this difference in performance?
>
> Thanks,
> Adrienne