why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

hagersaleh
why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

I want sort DataSet How can I do that?

customers = customers.filter(
            new FilterFunction<Customer>() {
                    @Override
                    public boolean filter(Customer c) {
                   
                   
                        return     Integer.parseInt(c.getField(0).toString())<=5 ;
                         
                    }
            });
       
       customers.groupBy(2).sortGroup(0, Order.DESCENDING);
       System.out.println(customers.print());
       customers.writeAsCsv("/home/hadoop/Desktop/Dataset/output.csv", "\n", "|");
       env.execute();  


public static class Customer extends Tuple5<Long,String,String,String,String> {
               
        }
        private static DataSet<Customer> getCustomerDataSet(ExecutionEnvironment env) {
                return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
                                        .fieldDelimiter('|')
                                        .includeFields("11100110").ignoreFirstLine()
                                        .tupleType(Customer.class);
        }

the result not sort
2> (1,Customer#000000001,IVhzIApeRb ot&&c&&E,711.56,BUILDING)
2> (2,Customer#000000002,XSTf4&&NCwDVaWNe6tEgvwfmRchLXak,121.65,AUTOMOBILE)
2> (3,Customer#000000003,MG9kdTD2WBHm,7498.12,AUTOMOBILE)
2> (4,Customer#000000004,XxVSJsLAGtn,2866.83,MACHINERY)
2> (5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,794.47,HOUSEHOLD)
Reply | Threaded
Open this post in threaded view
|

Re: why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

Chiwan Park
Hi. The sortGroup API returns a SortedGrouping object and but you don’t use the result. I think that you are confused with groupBy and sortGroup API. You should use this API such as following (I assumed you are using 0.8 or 0.9-milestone-1):

// select the first 10 data for each group.
DataSet<Customer> sorted = customers.groupBy(2).sortGroup(0, Order.DESCENDING).first(10);
System.out.println(sorted.print());

Note that Flink does not support global sort (FLINK-598) but only support local sort currently. The sortGroup API means that sorting for each group.


Regards,
Chiwan Park

> On Jun 2, 2015, at 5:02 AM, hagersaleh <[hidden email]> wrote:
>
> why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not
> sort
>
> I want sort DataSet How can I do that?
>
> customers = customers.filter(
>            new FilterFunction<Customer>() {
>                    @Override
>                    public boolean filter(Customer c) {
>
>
>                        return    
> Integer.parseInt(c.getField(0).toString())<=5 ;
>
>                    }
>            });
>
>       customers.groupBy(2).sortGroup(0, Order.DESCENDING);
>       System.out.println(customers.print());
>       customers.writeAsCsv("/home/hadoop/Desktop/Dataset/output.csv", "\n",
> "|");
>       env.execute();  
>
>
> public static class Customer extends
> Tuple5<Long,String,String,String,String> {
>
> }
>        private static DataSet<Customer>
> getCustomerDataSet(ExecutionEnvironment env) {
> return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
> .fieldDelimiter('|')
>
> .includeFields("11100110").ignoreFirstLine()
>                                        .tupleType(Customer.class);
> }
>
> the result not sort
> 2> (1,Customer#000000001,IVhzIApeRb ot&&c&&E,711.56,BUILDING)
> 2> (2,Customer#000000002,XSTf4&&NCwDVaWNe6tEgvwfmRchLXak,121.65,AUTOMOBILE)
> 2> (3,Customer#000000003,MG9kdTD2WBHm,7498.12,AUTOMOBILE)
> 2> (4,Customer#000000004,XxVSJsLAGtn,2866.83,MACHINERY)
> 2> (5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,794.47,HOUSEHOLD)
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/why-when-use-groupBy-2-sortGroup-0-Order-DESCENDING-not-group-by-and-not-sort-tp1436.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.





Reply | Threaded
Open this post in threaded view
|

Re: why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

Fabian Hueske-2

You can also use sortPartition() to sort all partitions locally.

On Jun 2, 2015 02:11, "Chiwan Park" <[hidden email]> wrote:
Hi. The sortGroup API returns a SortedGrouping object and but you don’t use the result. I think that you are confused with groupBy and sortGroup API. You should use this API such as following (I assumed you are using 0.8 or 0.9-milestone-1):

// select the first 10 data for each group.
DataSet<Customer> sorted = customers.groupBy(2).sortGroup(0, Order.DESCENDING).first(10);
System.out.println(sorted.print());

Note that Flink does not support global sort (FLINK-598) but only support local sort currently. The sortGroup API means that sorting for each group.


Regards,
Chiwan Park

> On Jun 2, 2015, at 5:02 AM, hagersaleh <[hidden email]> wrote:
>
> why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not
> sort
>
> I want sort DataSet How can I do that?
>
> customers = customers.filter(
>            new FilterFunction<Customer>() {
>                    @Override
>                    public boolean filter(Customer c) {
>
>
>                        return
> Integer.parseInt(c.getField(0).toString())<=5 ;
>
>                    }
>            });
>
>       customers.groupBy(2).sortGroup(0, Order.DESCENDING);
>       System.out.println(customers.print());
>       customers.writeAsCsv("/home/hadoop/Desktop/Dataset/output.csv", "\n",
> "|");
>       env.execute();
>
>
> public static class Customer extends
> Tuple5<Long,String,String,String,String> {
>
>       }
>        private static DataSet<Customer>
> getCustomerDataSet(ExecutionEnvironment env) {
>               return env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
>                                       .fieldDelimiter('|')
>
> .includeFields("11100110").ignoreFirstLine()
>                                        .tupleType(Customer.class);
>       }
>
> the result not sort
> 2> (1,Customer#000000001,IVhzIApeRb ot&&c&&E,711.56,BUILDING)
> 2> (2,Customer#000000002,XSTf4&&NCwDVaWNe6tEgvwfmRchLXak,121.65,AUTOMOBILE)
> 2> (3,Customer#000000003,MG9kdTD2WBHm,7498.12,AUTOMOBILE)
> 2> (4,Customer#000000004,XxVSJsLAGtn,2866.83,MACHINERY)
> 2> (5,Customer#000000005,KvpyuHCplrB84WgAiGV6sYpZq7Tj,794.47,HOUSEHOLD)
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/why-when-use-groupBy-2-sortGroup-0-Order-DESCENDING-not-group-by-and-not-sort-tp1436.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.





Reply | Threaded
Open this post in threaded view
|

Re: why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

hagersaleh
I want example for use sortPartition()
Reply | Threaded
Open this post in threaded view
|

Re: why when use groupBy(2).sortGroup(0, Order.DESCENDING); not group by and not sort

Chiwan Park
Note that sortPartition is implemented in 0.9. Following link shows the example of sortPartition.
http://ci.apache.org/projects/flink/flink-docs-master/apis/dataset_transformations.html#sort-partition

Regards,
Chiwan Park


> On Jun 2, 2015, at 5:51 PM, hagersaleh <[hidden email]> wrote:
>
> I want example for use sortPartition()
>
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/why-when-use-groupBy-2-sortGroup-0-Order-DESCENDING-not-group-by-and-not-sort-tp1436p1439.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.