If groupeby+reduceGroup is used, does each groupeby+reduceGroup take place on a single partition? If yes, if we have more groups than the partitions, what happens?
the groupBy + reduceGroup works across all partitions of a DataSet. This means that elements from each partition are grouped (creating potentially a new partitioning) and then for each group the reduceGroup function is executed.
Cheers,
Till
On Thu, Apr 20, 2017 at 5:14 PM, Mary m <[hidden email]> wrote:
Hi
If groupeby+reduceGroup is used, does each groupeby+reduceGroup take place on a single partition? If yes, if we have more groups than the partitions, what happens?