Re: [Flink] merge-sort for a DataStream

Posted by Kien Truong on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-merge-sort-for-a-DataStream-tp16739p16776.html

Hi Jiewen,

Since a DataStream can have infinite number of elements, you can't globally sorted all the elements.

If the number of element is finite, you can use the DataSet API, which will look smth like this


DataSet<List<comparable_pojo>> a;

DataSet<comparable_pojo> aFlatten = a.flatMap(..);

DataSet<comparable_pojo> aSorted = aFlatten.partitionByRange(...).sortPartition(...);


Best regards.

Kien

On 11/15/2017 6:13 AM, Jiewen Shao wrote:
In Flink, I have DataStream<List<comparable_pojo>>, each list is individually pre-sorted, what I need to do is persist everything in one shot with global sort order. any ides the best to do this? Hope it makes sense. 

Thanks in advance!