why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort
I want sort DataSet How can I do that? customers = customers.filter( new FilterFunction<Customer>() { @Override public boolean filter(Customer c) { return Integer.parseInt(c.getField(0).toString())<=5 ; } }); customers.groupBy(2).sortGroup(0, Order.DESCENDING); System.out.println(customers.print()); customers.writeAsCsv("/home/hadoop/Desktop/Dataset/output.csv", "\n", "|"); env.execute(); public static class Customer extends Tuple5<Long,String,String,String,String> { } private static DataSet<Customer> getCustomerDataSet(ExecutionEnvironment env) { return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv") .fieldDelimiter('|') .includeFields("11100110").ignoreFirstLine() .tupleType(Customer.class); } the result not sort 2> (1,Customer#000000001,IVhzIApeRb ot&&c&&E,711.56,BUILDING) 2> (2,Customer#000000002,XSTf4&&NCwDVaWNe6tEgvwfmRchLXak,121.65,AUTOMOBILE) 2> (3,Customer#000000003,MG9kdTD2WBHm,7498.12,AUTOMOBILE) 2> (4,Customer#000000004,XxVSJsLAGtn,2866.83,MACHINERY) 2> (5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,794.47,HOUSEHOLD) |
Hi. The sortGroup API returns a SortedGrouping object and but you don’t use the result. I think that you are confused with groupBy and sortGroup API. You should use this API such as following (I assumed you are using 0.8 or 0.9-milestone-1):
// select the first 10 data for each group. DataSet<Customer> sorted = customers.groupBy(2).sortGroup(0, Order.DESCENDING).first(10); System.out.println(sorted.print()); Note that Flink does not support global sort (FLINK-598) but only support local sort currently. The sortGroup API means that sorting for each group. Regards, Chiwan Park > On Jun 2, 2015, at 5:02 AM, hagersaleh <[hidden email]> wrote: > > why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not > sort > > I want sort DataSet How can I do that? > > customers = customers.filter( > new FilterFunction<Customer>() { > @Override > public boolean filter(Customer c) { > > > return > Integer.parseInt(c.getField(0).toString())<=5 ; > > } > }); > > customers.groupBy(2).sortGroup(0, Order.DESCENDING); > System.out.println(customers.print()); > customers.writeAsCsv("/home/hadoop/Desktop/Dataset/output.csv", "\n", > "|"); > env.execute(); > > > public static class Customer extends > Tuple5<Long,String,String,String,String> { > > } > private static DataSet<Customer> > getCustomerDataSet(ExecutionEnvironment env) { > return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv") > .fieldDelimiter('|') > > .includeFields("11100110").ignoreFirstLine() > .tupleType(Customer.class); > } > > the result not sort > 2> (1,Customer#000000001,IVhzIApeRb ot&&c&&E,711.56,BUILDING) > 2> (2,Customer#000000002,XSTf4&&NCwDVaWNe6tEgvwfmRchLXak,121.65,AUTOMOBILE) > 2> (3,Customer#000000003,MG9kdTD2WBHm,7498.12,AUTOMOBILE) > 2> (4,Customer#000000004,XxVSJsLAGtn,2866.83,MACHINERY) > 2> (5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,794.47,HOUSEHOLD) > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/why-when-use-groupBy-2-sortGroup-0-Order-DESCENDING-not-group-by-and-not-sort-tp1436.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
You can also use sortPartition() to sort all partitions locally. On Jun 2, 2015 02:11, "Chiwan Park" <[hidden email]> wrote:
Hi. The sortGroup API returns a SortedGrouping object and but you don’t use the result. I think that you are confused with groupBy and sortGroup API. You should use this API such as following (I assumed you are using 0.8 or 0.9-milestone-1): |
I want example for use sortPartition()
|
Note that sortPartition is implemented in 0.9. Following link shows the example of sortPartition.
http://ci.apache.org/projects/flink/flink-docs-master/apis/dataset_transformations.html#sort-partition Regards, Chiwan Park > On Jun 2, 2015, at 5:51 PM, hagersaleh <[hidden email]> wrote: > > I want example for use sortPartition() > > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/why-when-use-groupBy-2-sortGroup-0-Order-DESCENDING-not-group-by-and-not-sort-tp1436p1439.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Free forum by Nabble | Edit this page |