How to perform Broadcast and groupBy in DataStream like DataSet

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to perform Broadcast and groupBy in DataStream like DataSet

subashbasnet
Hello all,

How could we perform withBroadcastSet and groupBy in DataStream like that of DataSet in the below KMeans code:

DataSet<Centroid> newCentroids = points
.map(new SelectNearestCenter()).withBroadcastSet(loop, "centroids")
.map(new CountAppender()).groupBy(0).reduce(new CentroidAccumulator())
.map(new CentroidAverager());


DataStream<Centroid> newCentroids = points.map(new SelectNearestCenter()).???


Best Regards,
Subash Basnet
Reply | Threaded
Open this post in threaded view
|

Re: How to perform Broadcast and groupBy in DataStream like DataSet

stefanobaghino
I'm not sure in regards of "withBroadcastSet", but in the DataStream you "keyBy" instead of "groupBy".

On Tue, May 3, 2016 at 12:35 PM, subash basnet <[hidden email]> wrote:
Hello all,

How could we perform withBroadcastSet and groupBy in DataStream like that of DataSet in the below KMeans code:

DataSet<Centroid> newCentroids = points
.map(new SelectNearestCenter()).withBroadcastSet(loop, "centroids")
.map(new CountAppender()).groupBy(0).reduce(new CentroidAccumulator())
.map(new CentroidAverager());


DataStream<Centroid> newCentroids = points.map(new SelectNearestCenter()).???


Best Regards,
Subash Basnet



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: How to perform Broadcast and groupBy in DataStream like DataSet

subashbasnet
Hello Stefano, 

Thank you, I found out that just sometime ago that I could use keyBy, but I couldn't find how to set and getBroadcastVariable in datastream like that of dataset.  
For example in below code we get collection of centroids via broadcast. 

Eg: In KMeans.java
class X extends MapFunctions<>{
  private Collection<Centroid> centroids;
  public void open(Configuration parameters) throws Exception {
this.centroids = getRuntimeContext().getBroadcastVariable("centroids");
  }
  for (Centroid cent : centroids) {
  }
}


Best Regards,
Subash Basnet

On Tue, May 3, 2016 at 4:04 PM, Stefano Baghino <[hidden email]> wrote:
I'm not sure in regards of "withBroadcastSet", but in the DataStream you "keyBy" instead of "groupBy".

On Tue, May 3, 2016 at 12:35 PM, subash basnet <[hidden email]> wrote:
Hello all,

How could we perform withBroadcastSet and groupBy in DataStream like that of DataSet in the below KMeans code:

DataSet<Centroid> newCentroids = points
.map(new SelectNearestCenter()).withBroadcastSet(loop, "centroids")
.map(new CountAppender()).groupBy(0).reduce(new CentroidAccumulator())
.map(new CentroidAverager());


DataStream<Centroid> newCentroids = points.map(new SelectNearestCenter()).???


Best Regards,
Subash Basnet



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit