How to register aggregation convergence criterion to bulk iteration in scala API?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to register aggregation convergence criterion to bulk iteration in scala API?

Fridtjof Sander
Hi,

I want to register a custom aggregation convergence criterion to a bulk
iteration and I want to use the scala API.
It appears to me that this is not possible at the moment, right?

The AggregatorRegistry is exposed by IterativeDataSet.java, which is
hidden by DataSet.scala:

   def iterate(maxIterations: Int)(stepFunction: (DataSet[T]) =>
DataSet[T]): DataSet[T] = {
     val iterativeSet =
       new IterativeDataSet[T](
         javaSet.getExecutionEnvironment,
         javaSet.getType,
         javaSet,
         maxIterations)

     val resultSet = stepFunction(wrap(iterativeSet))
     val result = iterativeSet.closeWith(resultSet.javaSet)
     wrap(result)
   }

I am aware of the iterateWithTermination-possibility and it's a
work-around for me, but I guess the aggregated convergence criterion
would be more efficient.
Am I missing something?

Best,
Fridtjof
Reply | Threaded
Open this post in threaded view
|

Re: How to register aggregation convergence criterion to bulk iteration in scala API?

Stephan Ewen
You are right, that is currently missing in the Scala API. Would be good to add this, for feature completeness in the Scala API.

As a workaround for now: Can you access the Java IterativeDataSet from the Scala data set, and register it there?

Greetings,
Stephan


On Thu, Jan 28, 2016 at 11:05 PM, Fridtjof Sander <[hidden email]> wrote:
Hi,

I want to register a custom aggregation convergence criterion to a bulk iteration and I want to use the scala API.
It appears to me that this is not possible at the moment, right?

The AggregatorRegistry is exposed by IterativeDataSet.java, which is hidden by DataSet.scala:

  def iterate(maxIterations: Int)(stepFunction: (DataSet[T]) => DataSet[T]): DataSet[T] = {
    val iterativeSet =
      new IterativeDataSet[T](
        javaSet.getExecutionEnvironment,
        javaSet.getType,
        javaSet,
        maxIterations)

    val resultSet = stepFunction(wrap(iterativeSet))
    val result = iterativeSet.closeWith(resultSet.javaSet)
    wrap(result)
  }

I am aware of the iterateWithTermination-possibility and it's a work-around for me, but I guess the aggregated convergence criterion would be more efficient.
Am I missing something?

Best,
Fridtjof