(DEPRECATED) Apache Flink User Mailing List archive.

Convergence Criterion in IterativeDataSet

Classic

List

Threaded

3 messages Options

Andres R. Masegosa

Convergence Criterion in IterativeDataSet

Hi,

I trying to implement some machine learning algorithms that involve
several iterations until convergence (to a fixed point).

My idea is to use a IterativeDataSet with an Aggregator which produces
the result (i.e. a set of parameters defining the model).

From the interface "ConvergenceCriterion", I can understand that the
convergence criterion only depends on the result of the aggregator in
the current iteration (as happens with the DoubleZeroConvergence class).

However, it is more usual to test convergence by comparing the result of
the aggregator in the current iteration with the result of the
aggregator in the previous iteration (one usually stops when both
results are similar enough and we have converged to a fixed point).

I guess this functionality is not included yet. And this is because the
convergence criteria of flink implementations of K-Means and Linear
Regression is to stop after a fixed number of iterations.

Am I wrong?

Regards
Andres

Stephan Ewen

Re: Convergence Criterion in IterativeDataSet

I think you can do this with the current interface. The convergence criterion object stays around, so you should be able to simply store the current aggregator value in a field (when the check is invoked). Any round but the first could compare against that field.

On Fri, Sep 4, 2015 at 2:25 PM, Andres R. Masegosa <[hidden email]> wrote:

Hi,

I trying to implement some machine learning algorithms that involve
several iterations until convergence (to a fixed point).

My idea is to use a IterativeDataSet with an Aggregator which produces
the result (i.e. a set of parameters defining the model).

From the interface "ConvergenceCriterion", I can understand that the
convergence criterion only depends on the result of the aggregator in
the current iteration (as happens with the DoubleZeroConvergence class).

However, it is more usual to test convergence by comparing the result of
the aggregator in the current iteration with the result of the
aggregator in the previous iteration (one usually stops when both
results are similar enough and we have converged to a fixed point).

I guess this functionality is not included yet. And this is because the
convergence criteria of flink implementations of K-Means and Linear
Regression is to stop after a fixed number of iterations.

Am I wrong?

Regards
Andres

Sachin Goel

Re: Convergence Criterion in IterativeDataSet

Hi Andres
Does something like this solve what you're trying to achieve?
https://github.com/apache/flink/pull/918/files

Regards
Sachin

On Sep 4, 2015 6:24 PM, "Stephan Ewen" <[hidden email]> wrote:

I think you can do this with the current interface. The convergence criterion object stays around, so you should be able to simply store the current aggregator value in a field (when the check is invoked). Any round but the first could compare against that field.

On Fri, Sep 4, 2015 at 2:25 PM, Andres R. Masegosa <[hidden email]> wrote:
Hi,

I trying to implement some machine learning algorithms that involve
several iterations until convergence (to a fixed point).

My idea is to use a IterativeDataSet with an Aggregator which produces
the result (i.e. a set of parameters defining the model).

From the interface "ConvergenceCriterion", I can understand that the
convergence criterion only depends on the result of the aggregator in
the current iteration (as happens with the DoubleZeroConvergence class).

However, it is more usual to test convergence by comparing the result of
the aggregator in the current iteration with the result of the
aggregator in the previous iteration (one usually stops when both
results are similar enough and we have converged to a fixed point).

I guess this functionality is not included yet. And this is because the
convergence criteria of flink implementations of K-Means and Linear
Regression is to stop after a fixed number of iterations.

Am I wrong?

Regards
Andres