[Flink] merge-sort for a DataStream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Flink] merge-sort for a DataStream

Jiewen Shao-2
In Flink, I have DataStream<List<comparable_pojo>>, each list is individually pre-sorted, what I need to do is persist everything in one shot with global sort order. any ides the best to do this? Hope it makes sense. 

Thanks in advance!
Reply | Threaded
Open this post in threaded view
|

Re: [Flink] merge-sort for a DataStream

Kien Truong

Hi Jiewen,

Since a DataStream can have infinite number of elements, you can't globally sorted all the elements.

If the number of element is finite, you can use the DataSet API, which will look smth like this


DataSet<List<comparable_pojo>> a;

DataSet<comparable_pojo> aFlatten = a.flatMap(..);

DataSet<comparable_pojo> aSorted = aFlatten.partitionByRange(...).sortPartition(...);


Best regards.

Kien

On 11/15/2017 6:13 AM, Jiewen Shao wrote:
In Flink, I have DataStream<List<comparable_pojo>>, each list is individually pre-sorted, what I need to do is persist everything in one shot with global sort order. any ides the best to do this? Hope it makes sense. 

Thanks in advance!