Hi, How can I perform a reduce operation on a group of datasets using Flink? Let's say my map function gives out n datasets: d1, d2, ... dN Now I wish to perform my reduce operation on all the N datasets at once and not on an individual level. The only way I figured out till now is using the union operator first like following: List<Dataset<X>> dataList = Arrays.asList(d1, d2, ... dN); Dataset<X> dFinal = null; for(Dataset<X> ds: dataList) { dFinal = dFinal.union(ds); } dFinal.groupBy(0).reduce(...); Is there a more efficient way of doing the above task using java APIs? GroupReduce only works on a single dataset at a time and I can't find any other methods that take multiple datasets as an input parameter. Thanks, -- Ritesh Kumar Singh |
I think union is what you are looking for. Note that all data sets must be of the same type. 2016-05-18 16:15 GMT+02:00 Ritesh Kumar Singh <[hidden email]>:
|
Thanks for the reply Fabian,
Though here's a small thing I found on the documentation page: If you look into the Union section, "This operation happens implicitly if more than one data set is used for a specific function input." , I'm not sure what this is supposed to mean. My initial assumption was something like: dFinal = dFinal.union( d1, d2, ... , dN); // passing more than one dataset as function input. But as expected, this does not satisfy the union method signature. And so is that line supposed to mean something else? Or is it a feature not supported by flink 0.8 but works with future releases? Thanks, -- Ritesh Kumar Singh |
I think that sentence is misleading and refers to the internals of Flink. It should be removed, IMO. You can only union two DataSets. If you want to union more, you have to do it one by one.2016-05-19 14:44 GMT+02:00 Ritesh Kumar Singh <[hidden email]>:
|
Free forum by Nabble | Edit this page |