Nested iterations

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Nested iterations

Supun Kamburugamuve
Hi,

Does Flink support nested iterations? We are trying to develop a complex machine learning algorithm which has 3 iterations nested.

Best,
Supun..


Reply | Threaded
Open this post in threaded view
|

Re: Nested iterations

Gábor Gévay
Hello Supun,

Unfortunately, nesting of Flink's iteration constructs are not
supported at the moment.

There are some workarounds though:

1. You can start a Flink job for each step of the iteration. Starting
a Flink job has some overhead, so this only works if there is a
sufficient amount of work in each iteration step. Moreover, this has
the disadvantage that the intermediate results are always need to be
written out and then read back between steps, which might have a
considerable performance impact.

2. If you have just a small fixed number of steps, then you can have a
for loop that "unrolls" all the iteration steps, and creates one large
Flink job. The code will be somewhat similar to the first approach,
but you don't call execute between the steps, and you don't write
intermediate results to a sink, but just use the DataSet from the
previous step. The disadvantage of this is that you might end up with
a too large Flink job, which might also hurt performance.

Best,
Gábor






2016-09-01 18:09 GMT+02:00 Supun Kamburugamuve <[hidden email]>:
> Hi,
>
> Does Flink support nested iterations? We are trying to develop a complex
> machine learning algorithm which has 3 iterations nested.
>
> Best,
> Supun..
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nested iterations

Supun Kamburugamuve
Thanks Gabor. I was thinking about starting separate jobs.

Is there any plans to support nested loops in the future?

Thanks,
Supun..

On Thu, Sep 1, 2016 at 12:28 PM, Gábor Gévay <[hidden email]> wrote:
Hello Supun,

Unfortunately, nesting of Flink's iteration constructs are not
supported at the moment.

There are some workarounds though:

1. You can start a Flink job for each step of the iteration. Starting
a Flink job has some overhead, so this only works if there is a
sufficient amount of work in each iteration step. Moreover, this has
the disadvantage that the intermediate results are always need to be
written out and then read back between steps, which might have a
considerable performance impact.

2. If you have just a small fixed number of steps, then you can have a
for loop that "unrolls" all the iteration steps, and creates one large
Flink job. The code will be somewhat similar to the first approach,
but you don't call execute between the steps, and you don't write
intermediate results to a sink, but just use the DataSet from the
previous step. The disadvantage of this is that you might end up with
a too large Flink job, which might also hurt performance.

Best,
Gábor






2016-09-01 18:09 GMT+02:00 Supun Kamburugamuve <[hidden email]>:
> Hi,
>
> Does Flink support nested iterations? We are trying to develop a complex
> machine learning algorithm which has 3 iterations nested.
>
> Best,
> Supun..
>
>



--
Supun Kamburugamuve
Member, Apache Software Foundation; http://www.apache.org
E-mail: [hidden email]rg;  Mobile: +1 812 219 2563


Reply | Threaded
Open this post in threaded view
|

Re: Nested iterations

Gábor Gévay
I don't think that there are plans for enabling the nesting of the
native iteration constructs, but we should wait for one of the
commiters to confirm this.

However, the matter of caching of intermediate results has came up on
numerous occasions before [1,2,3,4,5], and it would be useful in lots
of other situations as well, so there is hope that it will be
implemented some day, which would make the 1. workaround from above
more feasible.

Best,
Gábor

[1] https://issues.apache.org/jira/browse/FLINK-1730
[2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Iteration-Intermediate-Output-td11850.html
[3] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Questions-re-ExecutionGraph-amp-ResultPartitions-for-interactive-use-a-la-Spark-td4154.html
[4] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-programm-with-for-loop-yields-wrong-results-when-run-in-parallel-td7783.html
[5] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Iterative-queries-on-Flink-td3786.html




2016-09-01 18:31 GMT+02:00 Supun Kamburugamuve <[hidden email]>:

> Thanks Gabor. I was thinking about starting separate jobs.
>
> Is there any plans to support nested loops in the future?
>
> Thanks,
> Supun..
>
> On Thu, Sep 1, 2016 at 12:28 PM, Gábor Gévay <[hidden email]> wrote:
>>
>> Hello Supun,
>>
>> Unfortunately, nesting of Flink's iteration constructs are not
>> supported at the moment.
>>
>> There are some workarounds though:
>>
>> 1. You can start a Flink job for each step of the iteration. Starting
>> a Flink job has some overhead, so this only works if there is a
>> sufficient amount of work in each iteration step. Moreover, this has
>> the disadvantage that the intermediate results are always need to be
>> written out and then read back between steps, which might have a
>> considerable performance impact.
>>
>> 2. If you have just a small fixed number of steps, then you can have a
>> for loop that "unrolls" all the iteration steps, and creates one large
>> Flink job. The code will be somewhat similar to the first approach,
>> but you don't call execute between the steps, and you don't write
>> intermediate results to a sink, but just use the DataSet from the
>> previous step. The disadvantage of this is that you might end up with
>> a too large Flink job, which might also hurt performance.
>>
>> Best,
>> Gábor
>>
>>
>>
>>
>>
>>
>> 2016-09-01 18:09 GMT+02:00 Supun Kamburugamuve <[hidden email]>:
>> > Hi,
>> >
>> > Does Flink support nested iterations? We are trying to develop a complex
>> > machine learning algorithm which has 3 iterations nested.
>> >
>> > Best,
>> > Supun..
>> >
>> >
>
>
>
>
> --
> Supun Kamburugamuve
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: [hidden email];  Mobile: +1 812 219 2563
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nested iterations

Supun Kamburugamuve
Thanks Gabor. I'll keep an eye on the developments.

Supun..

On Thu, Sep 1, 2016 at 12:57 PM, Gábor Gévay <[hidden email]> wrote:
I don't think that there are plans for enabling the nesting of the
native iteration constructs, but we should wait for one of the
commiters to confirm this.

However, the matter of caching of intermediate results has came up on
numerous occasions before [1,2,3,4,5], and it would be useful in lots
of other situations as well, so there is hope that it will be
implemented some day, which would make the 1. workaround from above
more feasible.

Best,
Gábor

[1] https://issues.apache.org/jira/browse/FLINK-1730
[2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Iteration-Intermediate-Output-td11850.html
[3] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Questions-re-ExecutionGraph-amp-ResultPartitions-for-interactive-use-a-la-Spark-td4154.html
[4] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-programm-with-for-loop-yields-wrong-results-when-run-in-parallel-td7783.html
[5] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Iterative-queries-on-Flink-td3786.html




2016-09-01 18:31 GMT+02:00 Supun Kamburugamuve <[hidden email]>:
> Thanks Gabor. I was thinking about starting separate jobs.
>
> Is there any plans to support nested loops in the future?
>
> Thanks,
> Supun..
>
> On Thu, Sep 1, 2016 at 12:28 PM, Gábor Gévay <[hidden email]> wrote:
>>
>> Hello Supun,
>>
>> Unfortunately, nesting of Flink's iteration constructs are not
>> supported at the moment.
>>
>> There are some workarounds though:
>>
>> 1. You can start a Flink job for each step of the iteration. Starting
>> a Flink job has some overhead, so this only works if there is a
>> sufficient amount of work in each iteration step. Moreover, this has
>> the disadvantage that the intermediate results are always need to be
>> written out and then read back between steps, which might have a
>> considerable performance impact.
>>
>> 2. If you have just a small fixed number of steps, then you can have a
>> for loop that "unrolls" all the iteration steps, and creates one large
>> Flink job. The code will be somewhat similar to the first approach,
>> but you don't call execute between the steps, and you don't write
>> intermediate results to a sink, but just use the DataSet from the
>> previous step. The disadvantage of this is that you might end up with
>> a too large Flink job, which might also hurt performance.
>>
>> Best,
>> Gábor
>>
>>
>>
>>
>>
>>
>> 2016-09-01 18:09 GMT+02:00 Supun Kamburugamuve <[hidden email]>:
>> > Hi,
>> >
>> > Does Flink support nested iterations? We are trying to develop a complex
>> > machine learning algorithm which has 3 iterations nested.
>> >
>> > Best,
>> > Supun..
>> >
>> >
>
>
>
>
> --
> Supun Kamburugamuve
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: [hidden email];  Mobile: <a href="tel:%2B1%20812%20219%202563" value="+18122192563">+1 812 219 2563
>
>



--
Supun Kamburugamuve
Member, Apache Software Foundation; http://www.apache.org
E-mail: [hidden email]rg;  Mobile: +1 812 219 2563