count + aggragation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

count + aggragation

Alieh
Hello all,

1st question:
Is there any way to know the count or the content of a "Fink DataSet"
without using count() or collect()? The problem is that I have a loop
which the number of iterations depends on the count of a DataSet. Using
count() may force the whole pipeline to be executed again. I do not like
to use delta or bulk iteration.

2nd question:

Using the "Aggregations.Max" on a DataSet of Tuple2<String, Integer> on
the second field, I observed that the second field is the real maximum
of the whole dataset while the first field is not the corresponding one
to the second!!!

Best,
Alieh

Reply | Threaded
Open this post in threaded view
|

Re: count + aggragation

Fabian Hueske-2
Hi Alieh,

I'm not aware of a solution to the first problem, but for the second issue you should use mayBy() instead of max().

Best, Fabian

2017-09-04 16:08 GMT+02:00 Alieh <[hidden email]>:
Hello all,

1st question:
Is there any way to know the count or the content of a "Fink DataSet" without using count() or collect()? The problem is that I have a loop which the number of iterations depends on the count of a DataSet. Using count() may force the whole pipeline to be executed again. I do not like to use delta or bulk iteration.

2nd question:

Using the "Aggregations.Max" on a DataSet of Tuple2<String, Integer> on the second field, I observed that the second field is the real maximum of the whole dataset while the first field is not the corresponding one to the second!!!

Best,
Alieh