AllWindowed vs Windowed with 1 key

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

AllWindowed vs Windowed with 1 key

Adrienne Kole
Hi,

I am doing simple aggregation with a keyed and global windows in flink.
When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance.

However, keyed stream with 1 key performs with 2x more throughput than global window.
My configuration is 8 node cluster, 16 core in each node, parallelism = 128.

AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.

What can be the reason behind this difference in performance?

Thanks,
Adrienne
Reply | Threaded
Open this post in threaded view
|

Re: AllWindowed vs Windowed with 1 key

Stefan Richter
Hi,

to answer this question, we would first need to know what you mean by „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer might depend on the Flink version that you are using.

Best,
Stefan

> Am 07.05.2017 um 23:23 schrieb Adrienne Kole <[hidden email]>:
>
> Hi,
>
> I am doing simple aggregation with a keyed and global windows in flink.
> When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance.
>
> However, keyed stream with 1 key performs with 2x more throughput than global window.
> My configuration is 8 node cluster, 16 core in each node, parallelism = 128.
>
> AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.
>
> What can be the reason behind this difference in performance?
>
> Thanks,
> Adrienne

Reply | Threaded
Open this post in threaded view
|

Re: AllWindowed vs Windowed with 1 key

Adrienne Kole
Hi,

Thanks for the reply. So I have 2 cases:

1. timeWindowAll (length, slide).reduce (...) (with parallelism = 1)
2. groupby(someField).timeWindow(length, slide). reduce(...)

Lets say case-1 global window, case-2 partitioned window. If I have only one key (for case-2) and I set parallelism=1  for case-1, I would expect that both cases have similar performance both in terms of latency and throughput. However, partitioned windows outperform global ones by orders of magnitude in terms of throughput.
I am using Flink 1.1.3.


Thanks,
Adrienne


 

On Mon, May 8, 2017 at 3:55 PM, Stefan Richter <[hidden email]> wrote:
Hi,

to answer this question, we would first need to know what you mean by „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer might depend on the Flink version that you are using.

Best,
Stefan

> Am 07.05.2017 um 23:23 schrieb Adrienne Kole <[hidden email]>:
>
> Hi,
>
> I am doing simple aggregation with a keyed and global windows in flink.
> When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance.
>
> However, keyed stream with 1 key performs with 2x more throughput than global window.
> My configuration is 8 node cluster, 16 core in each node, parallelism = 128.
>
> AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.
>
> What can be the reason behind this difference in performance?
>
> Thanks,
> Adrienne


Reply | Threaded
Open this post in threaded view
|

Re: AllWindowed vs Windowed with 1 key

Stefan Richter
That is interesting, because already in Flink 1.1.x, windowAll() is implemented as input.keyBy(new DummyKeySelector()).window(). Are you using event time or processing or event time, and most important, do the execution graphs in the web frontend look different in both variants?

Am 08.05.2017 um 16:51 schrieb Adrienne Kole <[hidden email]>:

Hi,

Thanks for the reply. So I have 2 cases:

1. timeWindowAll (length, slide).reduce (...) (with parallelism = 1)
2. groupby(someField).timeWindow(length, slide). reduce(...)

Lets say case-1 global window, case-2 partitioned window. If I have only one key (for case-2) and I set parallelism=1  for case-1, I would expect that both cases have similar performance both in terms of latency and throughput. However, partitioned windows outperform global ones by orders of magnitude in terms of throughput.
I am using Flink 1.1.3.


Thanks,
Adrienne


 

On Mon, May 8, 2017 at 3:55 PM, Stefan Richter <[hidden email]> wrote:
Hi,

to answer this question, we would first need to know what you mean by „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer might depend on the Flink version that you are using.

Best,
Stefan

> Am 07.05.2017 um 23:23 schrieb Adrienne Kole <[hidden email]>:
>
> Hi,
>
> I am doing simple aggregation with a keyed and global windows in flink.
> When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance.
>
> However, keyed stream with 1 key performs with 2x more throughput than global window.
> My configuration is 8 node cluster, 16 core in each node, parallelism = 128.
>
> AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.
>
> What can be the reason behind this difference in performance?
>
> Thanks,
> Adrienne