(DEPRECATED) Apache Flink User Mailing List archive.

Eliminating Shuffling Under FlinkSQL

Classic

List

Threaded

2 messages Options

Aeden Jameson

Eliminating Shuffling Under FlinkSQL

It's my understanding that a group by is also a key by under the hood.
As a result that will cause a shuffle operation to happen. Our source
is a Kafka topic that is keyed so that any give partition contains all
the data that is needed for any given consuming TM. Is there a way
using FlinkSQL to eliminate the shuffle operation? Or I'm missing
details other details that would make such a change undesirable?

Thank you,
Aeden

Dawid Wysakowicz-2

Re: Eliminating Shuffling Under FlinkSQL

Your understanding of a group by is correct. It is equivalent to a key
by. I agree it would be a great feature to keep the Source's
partitioning but unfortunately as of now it is not yet supported.

Best,

Dawid

On 18/03/2021 18:28, Aeden Jameson wrote:
> It's my understanding that a group by is also a key by under the hood.
> As a result that will cause a shuffle operation to happen. Our source
> is a Kafka topic that is keyed so that any give partition contains all
> the data that is needed for any given consuming TM. Is there a way
> using FlinkSQL to eliminate the shuffle operation? Or I'm missing
> details other details that would make such a change undesirable?
>
> Thank you,
> Aeden

OpenPGP_signature (855 bytes) Download Attachment