Hi,
Does Flink support nested iterations? We are trying to develop a complex machine learning algorithm which has 3 iterations nested.
Best, Supun.. |
Hello Supun,
Unfortunately, nesting of Flink's iteration constructs are not supported at the moment. There are some workarounds though: 1. You can start a Flink job for each step of the iteration. Starting a Flink job has some overhead, so this only works if there is a sufficient amount of work in each iteration step. Moreover, this has the disadvantage that the intermediate results are always need to be written out and then read back between steps, which might have a considerable performance impact. 2. If you have just a small fixed number of steps, then you can have a for loop that "unrolls" all the iteration steps, and creates one large Flink job. The code will be somewhat similar to the first approach, but you don't call execute between the steps, and you don't write intermediate results to a sink, but just use the DataSet from the previous step. The disadvantage of this is that you might end up with a too large Flink job, which might also hurt performance. Best, Gábor 2016-09-01 18:09 GMT+02:00 Supun Kamburugamuve <[hidden email]>: > Hi, > > Does Flink support nested iterations? We are trying to develop a complex > machine learning algorithm which has 3 iterations nested. > > Best, > Supun.. > > |
Thanks Gabor. I was thinking about starting separate jobs. Is there any plans to support nested loops in the future? Thanks, Supun.. On Thu, Sep 1, 2016 at 12:28 PM, Gábor Gévay <[hidden email]> wrote: Hello Supun, Supun Kamburugamuve Member, Apache Software Foundation; http://www.apache.org E-mail: [hidden email]rg; Mobile: +1 812 219 2563 |
I don't think that there are plans for enabling the nesting of the
native iteration constructs, but we should wait for one of the commiters to confirm this. However, the matter of caching of intermediate results has came up on numerous occasions before [1,2,3,4,5], and it would be useful in lots of other situations as well, so there is hope that it will be implemented some day, which would make the 1. workaround from above more feasible. Best, Gábor [1] https://issues.apache.org/jira/browse/FLINK-1730 [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Iteration-Intermediate-Output-td11850.html [3] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Questions-re-ExecutionGraph-amp-ResultPartitions-for-interactive-use-a-la-Spark-td4154.html [4] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-programm-with-for-loop-yields-wrong-results-when-run-in-parallel-td7783.html [5] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Iterative-queries-on-Flink-td3786.html 2016-09-01 18:31 GMT+02:00 Supun Kamburugamuve <[hidden email]>: > Thanks Gabor. I was thinking about starting separate jobs. > > Is there any plans to support nested loops in the future? > > Thanks, > Supun.. > > On Thu, Sep 1, 2016 at 12:28 PM, Gábor Gévay <[hidden email]> wrote: >> >> Hello Supun, >> >> Unfortunately, nesting of Flink's iteration constructs are not >> supported at the moment. >> >> There are some workarounds though: >> >> 1. You can start a Flink job for each step of the iteration. Starting >> a Flink job has some overhead, so this only works if there is a >> sufficient amount of work in each iteration step. Moreover, this has >> the disadvantage that the intermediate results are always need to be >> written out and then read back between steps, which might have a >> considerable performance impact. >> >> 2. If you have just a small fixed number of steps, then you can have a >> for loop that "unrolls" all the iteration steps, and creates one large >> Flink job. The code will be somewhat similar to the first approach, >> but you don't call execute between the steps, and you don't write >> intermediate results to a sink, but just use the DataSet from the >> previous step. The disadvantage of this is that you might end up with >> a too large Flink job, which might also hurt performance. >> >> Best, >> Gábor >> >> >> >> >> >> >> 2016-09-01 18:09 GMT+02:00 Supun Kamburugamuve <[hidden email]>: >> > Hi, >> > >> > Does Flink support nested iterations? We are trying to develop a complex >> > machine learning algorithm which has 3 iterations nested. >> > >> > Best, >> > Supun.. >> > >> > > > > > > -- > Supun Kamburugamuve > Member, Apache Software Foundation; http://www.apache.org > E-mail: [hidden email]; Mobile: +1 812 219 2563 > > |
Thanks Gabor. I'll keep an eye on the developments. Supun.. On Thu, Sep 1, 2016 at 12:57 PM, Gábor Gévay <[hidden email]> wrote: I don't think that there are plans for enabling the nesting of the Supun Kamburugamuve Member, Apache Software Foundation; http://www.apache.org E-mail: [hidden email]rg; Mobile: +1 812 219 2563 |
Free forum by Nabble | Edit this page |