Wrong and non consistent behavior of max

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Wrong and non consistent behavior of max

Maximilian Alber
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max

Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Fabian Hueske
Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce) function or use the Java function in you Scala code.

Best, Fabian


2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max


Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Maximilian Alber
Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce) function or use the Java function in you Scala code.

Best, Fabian



2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max



Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Ufuk Celebi
This is not the first time that people confused this. I think most people expect the maxBy and minBy behaviour for max/min.

Maybe it makes sense to move back to the old aggregations API, where you call the aggregate method and specify as an argument, which type of aggregation should be performed. I didn't really like this, but if the current state is confusing people, we should consider to change it again.

On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <[hidden email]> wrote:
Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce) function or use the Java function in you Scala code.

Best, Fabian



2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max




Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Fabian Hueske
I am not sure about this. With the new aggregations that Viktor is working on, things become pretty obvious IMO.

data.aggregate(count(), sum(2), min(0));
basically shows the structure of the result.

I would not go and overload min() and max() in different contexts.

2014-11-28 14:53 GMT+01:00 Ufuk Celebi <[hidden email]>:
This is not the first time that people confused this. I think most people expect the maxBy and minBy behaviour for max/min.

Maybe it makes sense to move back to the old aggregations API, where you call the aggregate method and specify as an argument, which type of aggregation should be performed. I didn't really like this, but if the current state is confusing people, we should consider to change it again.

On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <[hidden email]> wrote:
Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce) function or use the Java function in you Scala code.

Best, Fabian



2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max





Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Ufuk Celebi
No, I didn't say that I want to overload min/max. I think Viktor's changes are exactly in line with what I said. min/max(X) or minBy/maxBy(X) could be (maybe deprecated) shortcuts for aggregate(max/min(X)).

On Fri, Nov 28, 2014 at 2:59 PM, Fabian Hueske <[hidden email]> wrote:
I am not sure about this. With the new aggregations that Viktor is working on, things become pretty obvious IMO.

data.aggregate(count(), sum(2), min(0));
basically shows the structure of the result.

I would not go and overload min() and max() in different contexts.

2014-11-28 14:53 GMT+01:00 Ufuk Celebi <[hidden email]>:
This is not the first time that people confused this. I think most people expect the maxBy and minBy behaviour for max/min.

Maybe it makes sense to move back to the old aggregations API, where you call the aggregate method and specify as an argument, which type of aggregation should be performed. I didn't really like this, but if the current state is confusing people, we should consider to change it again.

On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <[hidden email]> wrote:
Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce) function or use the Java function in you Scala code.

Best, Fabian



2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max






Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Stephan Ewen
Viktor's work does not replace minBy or maxBy, right (they are not agg functions, but selectors)? I think we would retain those anyways. They also make little sense in composite aggregations, because you can always only use one of them...

So, the question is whether to keep min() max() or to have that only accessible through the Viktor aggregations.


On Fri, Nov 28, 2014 at 4:32 PM, Ufuk Celebi <[hidden email]> wrote:
No, I didn't say that I want to overload min/max. I think Viktor's changes are exactly in line with what I said. min/max(X) or minBy/maxBy(X) could be (maybe deprecated) shortcuts for aggregate(max/min(X)).

On Fri, Nov 28, 2014 at 2:59 PM, Fabian Hueske <[hidden email]> wrote:
I am not sure about this. With the new aggregations that Viktor is working on, things become pretty obvious IMO.

data.aggregate(count(), sum(2), min(0));
basically shows the structure of the result.

I would not go and overload min() and max() in different contexts.

2014-11-28 14:53 GMT+01:00 Ufuk Celebi <[hidden email]>:
This is not the first time that people confused this. I think most people expect the maxBy and minBy behaviour for max/min.

Maybe it makes sense to move back to the old aggregations API, where you call the aggregate method and specify as an argument, which type of aggregation should be performed. I didn't really like this, but if the current state is confusing people, we should consider to change it again.

On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <[hidden email]> wrote:
Hi Fabian!

Ok, thanks! Now it works.

Cheers,
Max

On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi Max,

the max(i) function does not select the Tuple with the maximum value.
Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).

The Java API features minBy and maxBy transformations that should do what you are looking for.
You can reimplement them for Scala as a simple GroupReduce (or Reduce) function or use the Java function in you Scala code.

Best, Fabian



2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
Hi Flinksters,

I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element.

The error does just happen with my real data, not if I inject some sequence into costs.

The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly.

The code snippet:

val maxCost = costs map {x => (x.id, x.value)} max(1)
   
(costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile

The output:

File content:
first: (47,42.066986)
first: (11,4.448255)
first: (40,42.06696)
first: (3,0.96731037)
first: (31,42.06443)
first: (18,23.753584)
first: (45,42.066986)
first: (24,41.44347)
first: (13,6.1290965)
first: (19,26.42948)
first: (1,0.9665109)
first: (28,42.04222)
first: (5,1.2986814)
first: (44,42.066986)
first: (7,1.8681992)
first: (10,3.0981758)
first: (41,42.066982)
first: (48,42.066986)
first: (21,33.698544)
first: (38,42.066963)
first: (30,42.06153)
first: (26,41.950237)
first: (43,42.066986)
first: (16,14.754578)
first: (15,10.571205)
first: (34,42.06672)
first: (29,42.055424)
first: (35,42.066845)
first: (8,1.9513339)
first: (22,38.189228)
first: (46,42.066986)
first: (2,0.966511)
first: (27,42.013676)
first: (12,5.4271784)
first: (42,42.066986)
first: (4,1.01561)
first: (14,7.4410205)
first: (25,41.803535)
first: (6,1.6827519)
first: (36,42.06694)
first: (20,28.834095)
first: (32,42.06577)
first: (49,42.066986)
first: (33,42.0664)
first: (9,2.2420964)
first: (37,42.066967)
first: (0,0.9665109)
first: (17,19.016153)
first: (39,42.06697)
first: (23,40.512672)
second: (23,42.066986)

File content end.


Thanks!
Cheers,
Max







Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Viktor Rosenfeld
Hi,

I would drop min/max/sum. I've marked them deprecated in my pull request because I did not change their current implementation based on the old aggregate.

I think that it makes more sense to have convenience functions like countByKey which Sebastian suggested that do encapsulate more complex behavior than an aggregation on a single field.

I would also rename minBy/maxBy to selectMin/selectMax or something similar (selectMinBy?, selectMinByField?) to make their behavior more explicit.

Best,
Viktor
Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Ufuk Celebi
@Stephan: Yap. :)

I agree with Viktor's suggestions. In order to not break existing programs I would keep the deprecation annotation as you have in your branch. We could then remove them in 0.9.0 or so. For minBy/maxBy I would rename it as you suggested, but keep the old methods and deprecate them as well.

Maybe someone can comment on the Javadocs changes here: https://github.com/apache/incubator-flink/pull/244

I hope that it makes the API clearer for now.

On Fri, Nov 28, 2014 at 6:23 PM, Viktor Rosenfeld <[hidden email]> wrote:
Hi,

I would drop min/max/sum. I've marked them deprecated in my pull request
because I did not change their current implementation based on the old
aggregate.

I think that it makes more sense to have convenience functions like
countByKey which Sebastian suggested that do encapsulate more complex
behavior than an aggregation on a single field.

I would also rename minBy/maxBy to selectMin/selectMax or something similar
(selectMinBy?, selectMinByField?) to make their behavior more explicit.

Best,
Viktor



--
View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p499.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Wrong and non consistent behavior of max

Till Rohrmann
I don't know whether this is important because we are talking about the Java API, but for people familiar with Scala or F#, for example, the methods minBy/maxBy make perfectly sense. These methods are part of the standard collection API of the respective language and select the element with the minimum/maximum selected key. Of course, Flink's minBy/maxBy return a DataSet with the element instead of the element itself.

On Fri, Nov 28, 2014 at 6:59 PM, Ufuk Celebi <[hidden email]> wrote:
@Stephan: Yap. :)

I agree with Viktor's suggestions. In order to not break existing programs I would keep the deprecation annotation as you have in your branch. We could then remove them in 0.9.0 or so. For minBy/maxBy I would rename it as you suggested, but keep the old methods and deprecate them as well.

Maybe someone can comment on the Javadocs changes here: https://github.com/apache/incubator-flink/pull/244

I hope that it makes the API clearer for now.

On Fri, Nov 28, 2014 at 6:23 PM, Viktor Rosenfeld <[hidden email]> wrote:
Hi,

I would drop min/max/sum. I've marked them deprecated in my pull request
because I did not change their current implementation based on the old
aggregate.

I think that it makes more sense to have convenience functions like
countByKey which Sebastian suggested that do encapsulate more complex
behavior than an aggregation on a single field.

I would also rename minBy/maxBy to selectMin/selectMax or something similar
(selectMinBy?, selectMinByField?) to make their behavior more explicit.

Best,
Viktor



--
View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Wrong-and-non-consistent-behavior-of-max-tp484p499.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.