In batch mode, if input is sorted prior to a group by operation; does flink forward the aggregate data early? Is there a way to prevent grouping operations from buffering all data in a GBK operation in batch mode?
The combine() method will be executed on the sender side, reducing the amount of data to spill on disk. This only works if your data allows such early aggregations.
On Thu, Feb 13, 2020 at 8:01 PM Richard Moorhead <[hidden email]> wrote:
In batch mode, if input is sorted prior to a group by operation; does flink forward the aggregate data early? Is there a way to prevent grouping operations from buffering all data in a GBK operation in batch mode?