Hi Flinksters, I don't if I made something wrong, but the code seems fine. Basically the max function does extract a wrong element. The error does just happen with my real data, not if I inject some sequence into costs. The problem is that the according tuple value at position is wrong. The maximum of the second part is detected correctly. The code snippet: val maxCost = costs map {x => (x.id, x.value)} max(1) (costs map {x => (x.id, x.value)} map {_ toString} map {"first: "+ _ }) union (maxCost map {_ toString} map {"second: "+ _ }) writeAsText config.outFile The output: File content: first: (47,42.066986) first: (11,4.448255) first: (40,42.06696) first: (3,0.96731037) first: (31,42.06443) first: (18,23.753584) first: (45,42.066986) first: (24,41.44347) first: (13,6.1290965) first: (19,26.42948) first: (1,0.9665109) first: (28,42.04222) first: (5,1.2986814) first: (44,42.066986) first: (7,1.8681992) first: (10,3.0981758) first: (41,42.066982) first: (48,42.066986) first: (21,33.698544) first: (38,42.066963) first: (30,42.06153) first: (26,41.950237) first: (43,42.066986) first: (16,14.754578) first: (15,10.571205) first: (34,42.06672) first: (29,42.055424) first: (35,42.066845) first: (8,1.9513339) first: (22,38.189228) first: (46,42.066986) first: (2,0.966511) first: (27,42.013676) first: (12,5.4271784) first: (42,42.066986) first: (4,1.01561) first: (14,7.4410205) first: (25,41.803535) first: (6,1.6827519) first: (36,42.06694) first: (20,28.834095) first: (32,42.06577) first: (49,42.066986) first: (33,42.0664) first: (9,2.2420964) first: (37,42.066967) first: (0,0.9665109) first: (17,19.016153) first: (39,42.06697) first: (23,40.512672) second: (23,42.066986) File content end. Thanks! Cheers, Max |
Hi Max, Instead, it builds a new Tuple with the maximum value for the i-th attribute. The values of the Tuple's other fields are not defined (in practice they are set to the value of the last Tuple, however the order of Tuples is not defined).the max(i) function does not select the Tuple with the maximum value. Best, Fabian 2014-11-27 16:14 GMT+01:00 Maximilian Alber <[hidden email]>:
|
Hi Fabian! Ok, thanks! Now it works. Cheers, Max On Fri, Nov 28, 2014 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
|
This is not the first time that people confused this. I think most people expect the maxBy and minBy behaviour for max/min. Maybe it makes sense to move back to the old aggregations API, where you call the aggregate method and specify as an argument, which type of aggregation should be performed. I didn't really like this, but if the current state is confusing people, we should consider to change it again. On Fri, Nov 28, 2014 at 12:31 PM, Maximilian Alber <[hidden email]> wrote:
|
I am not sure about this. With the new aggregations that Viktor is working on, things become pretty obvious IMO. data.aggregate(count(), sum(2), min(0)); 2014-11-28 14:53 GMT+01:00 Ufuk Celebi <[hidden email]>:
|
No, I didn't say that I want to overload min/max. I think Viktor's changes are exactly in line with what I said. min/max(X) or minBy/maxBy(X) could be (maybe deprecated) shortcuts for aggregate(max/min(X)). On Fri, Nov 28, 2014 at 2:59 PM, Fabian Hueske <[hidden email]> wrote:
|
Viktor's work does not replace minBy or maxBy, right (they are not agg functions, but selectors)? I think we would retain those anyways. They also make little sense in composite aggregations, because you can always only use one of them... So, the question is whether to keep min() max() or to have that only accessible through the Viktor aggregations. On Fri, Nov 28, 2014 at 4:32 PM, Ufuk Celebi <[hidden email]> wrote:
|
Hi,
I would drop min/max/sum. I've marked them deprecated in my pull request because I did not change their current implementation based on the old aggregate. I think that it makes more sense to have convenience functions like countByKey which Sebastian suggested that do encapsulate more complex behavior than an aggregation on a single field. I would also rename minBy/maxBy to selectMin/selectMax or something similar (selectMinBy?, selectMinByField?) to make their behavior more explicit. Best, Viktor |
@Stephan: Yap. :) Maybe someone can comment on the Javadocs changes here: https://github.com/apache/incubator-flink/pull/244I agree with Viktor's suggestions. In order to not break existing programs I would keep the deprecation annotation as you have in your branch. We could then remove them in 0.9.0 or so. For minBy/maxBy I would rename it as you suggested, but keep the old methods and deprecate them as well. On Fri, Nov 28, 2014 at 6:23 PM, Viktor Rosenfeld <[hidden email]> wrote: Hi, |
I don't know whether this is important because we are talking about the Java API, but for people familiar with Scala or F#, for example, the methods minBy/maxBy make perfectly sense. These methods are part of the standard collection API of the respective language and select the element with the minimum/maximum selected key. Of course, Flink's minBy/maxBy return a DataSet with the element instead of the element itself. On Fri, Nov 28, 2014 at 6:59 PM, Ufuk Celebi <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |