sorted cogroup

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

sorted cogroup

Michele Bertoni
Hi everybody, i need to execute a cogroup on sorted groups.
I explain it better: I have two datasets i.e. (key, value), I want to cogroup on key and then the have both iterator sorted by value
how can i get it?
I know iterator should be collected to be sorted but i want to avoid it. what happens if i partition datasets separately by key, then sort partition and finally cogroup by key? can I assume they keep the order on key?

which is the drawback in doing this?
I expect to have two data shuffling one partition and one for cogroup


thanks

Best
michele
Reply | Threaded
Open this post in threaded view
|

Re: sorted cogroup

Till Rohrmann

Hi Michele,

Flink supports coGroups on sorted inputs. If you have a ds1 = DataSet[(Key, Value1)] and ds2 = DataSet[(Key, Value2)] you obtain a sorted coGroup for example by:

ds1.coGroup(ds2).where(0).equalsTo(0).sortFirstGroup(1, Order.ASCENDING).sortSecondGroup(1, Order.DESCENDING)

Cheers,
Till


On Tue, Jul 21, 2015 at 7:27 AM, Michele Bertoni <[hidden email]> wrote:
Hi everybody, i need to execute a cogroup on sorted groups.
I explain it better: I have two datasets i.e. (key, value), I want to cogroup on key and then the have both iterator sorted by value
how can i get it?
I know iterator should be collected to be sorted but i want to avoid it. what happens if i partition datasets separately by key, then sort partition and finally cogroup by key? can I assume they keep the order on key?

which is the drawback in doing this?
I expect to have two data shuffling one partition and one for cogroup


thanks

Best
michele