Eliminating Shuffling Under FlinkSQL

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Eliminating Shuffling Under FlinkSQL

Aeden Jameson
It's my understanding that a group by is also a key by under the hood.
As a result that will cause a shuffle operation to happen. Our source
is a Kafka topic that is keyed so that any give partition contains all
the data that is needed for any given consuming TM. Is there a way
using FlinkSQL to eliminate the shuffle operation? Or I'm missing
details other details that would make such a change undesirable?

Thank you,
Aeden
Reply | Threaded
Open this post in threaded view
|

Re: Eliminating Shuffling Under FlinkSQL

Dawid Wysakowicz-2
Your understanding of a group by is correct. It is equivalent to a key
by. I agree it would be a great feature to keep the Source's
partitioning but unfortunately as of now it is not yet supported.

Best,

Dawid

On 18/03/2021 18:28, Aeden Jameson wrote:
> It's my understanding that a group by is also a key by under the hood.
> As a result that will cause a shuffle operation to happen. Our source
> is a Kafka topic that is keyed so that any give partition contains all
> the data that is needed for any given consuming TM. Is there a way
> using FlinkSQL to eliminate the shuffle operation? Or I'm missing
> details other details that would make such a change undesirable?
>
> Thank you,
> Aeden


OpenPGP_signature (855 bytes) Download Attachment