How to iterate over DataSet elements without converting it to List

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to iterate over DataSet elements without converting it to List

subashbasnet
Hello there,

I have been stuck on how to iterate over the DataSet, perform operations and return a new modified DataSet similar to that of list operation as shown below. 
Eg: 
for (Centroid centroid : centroids.collect()) {
    for (Tuple2<Integer, Point> element : clusteredPoints.collect()) {
       //perform necessary operations
     }
//add elements
}
//return elements list

I need to iterate over dataSet without converting it to List. 


Best Regards,
Subash Basnet
Reply | Threaded
Open this post in threaded view
|

Re: How to iterate over DataSet elements without converting it to List

Judit Fehér
Hi,

if you want to iterate through a DataSet you can simply use the map function on the DataSets instead of for loops. 
In your example you have nested loops, instead of this you can join the two datasets
and then perform the map function.
It looks like you may want to implement a k-means algorithm. If this is the case then you might want to check out the Flink k-means example in the machine learning library, you will also see here how to iterate through a DataSet with the map function:

Best Regards,
Judit

2016-02-18 15:29 GMT+01:00 subash basnet <[hidden email]>:
Hello there,

I have been stuck on how to iterate over the DataSet, perform operations and return a new modified DataSet similar to that of list operation as shown below. 
Eg: 
for (Centroid centroid : centroids.collect()) {
    for (Tuple2<Integer, Point> element : clusteredPoints.collect()) {
       //perform necessary operations
     }
//add elements
}
//return elements list

I need to iterate over dataSet without converting it to List. 


Best Regards,
Subash Basnet