Nested Iterations supported in Flink?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Nested Iterations supported in Flink?

Benoît Hanotte
Hello,

I'm implementing an algorithm which requires nested iterations, and, from  
what I understood, this feature was not yet available in Flink [1], and my  
experiments with 2 nested bulk iterations seem to confirm that. However I  
came across a Flink unit test [2] using nested iterations, so I'm now a  
bit confused. Could you please give me some insight on what is supported  
or not in the current state of the project?

Thanks!

Benoît.

[1]  
http://mail-archives.apache.org/mod_mbox/flink-user/201411.mbox/%3CCANC1h_tpNFWwrnm14+Et1yBvkQPQ1-pfk=iUPP5Un90ZgUGeyg@...%3E
[2]  
https://github.com/apache/flink/blob/master/flink-optimizer/src/test/java/org/apache/flink/optimizer/NestedIterationsTest.java
Reply | Threaded
Open this post in threaded view
|

Re: Nested Iterations supported in Flink?

Stephan Ewen
Hi Benoît!

You are right, the nested iterations are currently not supported.

The test you found actually checks that the Optimizer gives a good error message when encountering nested iterations.

Can you write your program as one iterations (the inner) and start the program multiple times to simulate the nesting?

Greetings,
Stephan


On Tue, Apr 14, 2015 at 8:11 AM, Benoît Hanotte <[hidden email]> wrote:
Hello,

I'm implementing an algorithm which requires nested iterations, and, from what I understood, this feature was not yet available in Flink [1], and my experiments with 2 nested bulk iterations seem to confirm that. However I came across a Flink unit test [2] using nested iterations, so I'm now a bit confused. Could you please give me some insight on what is supported or not in the current state of the project?

Thanks!

Benoît.

[1] http://mail-archives.apache.org/mod_mbox/flink-user/201411.mbox/%3CCANC1h_tpNFWwrnm14+Et1yBvkQPQ1-pfk=iUPP5Un90ZgUGeyg@....com%3E
[2] https://github.com/apache/flink/blob/master/flink-optimizer/src/test/java/org/apache/flink/optimizer/NestedIterationsTest.java

Reply | Threaded
Open this post in threaded view
|

Re: Nested Iterations supported in Flink?

Till Rohrmann-2
If your inner iterations happens to work only on the data of a single partition, then you can also implement this iteration as part of a mapPartition operator. The only problem there would be that you have to keep all the partition's data on the heap, if you need access to it.

Cheers,

Till

On Tue, Apr 14, 2015 at 3:11 PM, Stephan Ewen <[hidden email]> wrote:
Hi Benoît!

You are right, the nested iterations are currently not supported.

The test you found actually checks that the Optimizer gives a good error message when encountering nested iterations.

Can you write your program as one iterations (the inner) and start the program multiple times to simulate the nesting?

Greetings,
Stephan


On Tue, Apr 14, 2015 at 8:11 AM, Benoît Hanotte <[hidden email]> wrote:
Hello,

I'm implementing an algorithm which requires nested iterations, and, from what I understood, this feature was not yet available in Flink [1], and my experiments with 2 nested bulk iterations seem to confirm that. However I came across a Flink unit test [2] using nested iterations, so I'm now a bit confused. Could you please give me some insight on what is supported or not in the current state of the project?

Thanks!

Benoît.

[1] http://mail-archives.apache.org/mod_mbox/flink-user/201411.mbox/%3CCANC1h_tpNFWwrnm14+Et1yBvkQPQ1-pfk=iUPP5Un90ZgUGeyg@....com%3E
[2] https://github.com/apache/flink/blob/master/flink-optimizer/src/test/java/org/apache/flink/optimizer/NestedIterationsTest.java


Reply | Threaded
Open this post in threaded view
|

Re: Nested Iterations supported in Flink?

Benoît Hanotte
Thanks for you quick answers!

The algorithm is the following: I've got a spatial set of data and I want to find dense regions. The space is beforehand discretized into "cells" of a fixed size. Then, for each dense cell (1st iteration), starting with the most dense, the algorithm tries to extend it with neighboring cells (the nested iteration) until the density of the obtained extended cell is under a threshold (and some other conditions). When that's done, the used cells are removed from the input set, and the next most dense cell is extended.

Would running the program multiple times be efficient in that case (I would have to save the datasets and reload them probably)? As for using the mapPartition operator I think it wouldn't be possible since the nested iteration needs the entire cells datasets.

Best,

Benoît 


Le Tue, 14 Apr 2015 15:16:48 +0200, Till Rohrmann <[hidden email]> a écrit:

If your inner iterations happens to work only on the data of a single partition, then you can also implement this iteration as part of a mapPartition operator. The only problem there would be that you have to keep all the partition's data on the heap, if you need access to it.

Cheers,

Till

On Tue, Apr 14, 2015 at 3:11 PM, Stephan Ewen <[hidden email]> wrote:
Hi Benoît!

You are right, the nested iterations are currently not supported.

The test you found actually checks that the Optimizer gives a good error message when encountering nested iterations.

Can you write your program as one iterations (the inner) and start the program multiple times to simulate the nesting?

Greetings,
Stephan


On Tue, Apr 14, 2015 at 8:11 AM, Benoît Hanotte <[hidden email]> wrote:
Hello,

I'm implementing an algorithm which requires nested iterations, and, from what I understood, this feature was not yet available in Flink [1], and my experiments with 2 nested bulk iterations seem to confirm that. However I came across a Flink unit test [2] using nested iterations, so I'm now a bit confused. Could you please give me some insight on what is supported or not in the current state of the project?

Thanks!

Benoît.

[1] http://mail-archives.apache.org/mod_mbox/flink-user/201411.mbox/%3CCANC1h_tpNFWwrnm14+Et1yBvkQPQ1-pfk=iUPP5Un90ZgUGeyg@...%3E
[2] https://github.com/apache/flink/blob/master/flink-optimizer/src/test/java/org/apache/flink/optimizer/NestedIterationsTest.java





Reply | Threaded
Open this post in threaded view
|

Re: Nested Iterations supported in Flink?

Stephan Ewen
I think running the program multiple times is a reasonable way to start working on this.

I would try and see whether this can be re-written to a non-nested iterations case. Nestes iterations algorithms may have much more overhead to start with.

Stephan


On Tue, Apr 14, 2015 at 3:53 PM, Benoît Hanotte <[hidden email]> wrote:
Thanks for you quick answers!

The algorithm is the following: I've got a spatial set of data and I want to find dense regions. The space is beforehand discretized into "cells" of a fixed size. Then, for each dense cell (1st iteration), starting with the most dense, the algorithm tries to extend it with neighboring cells (the nested iteration) until the density of the obtained extended cell is under a threshold (and some other conditions). When that's done, the used cells are removed from the input set, and the next most dense cell is extended.

Would running the program multiple times be efficient in that case (I would have to save the datasets and reload them probably)? As for using the mapPartition operator I think it wouldn't be possible since the nested iteration needs the entire cells datasets.

Best,

Benoît 


Le Tue, 14 Apr 2015 15:16:48 +0200, Till Rohrmann <[hidden email]> a écrit:

If your inner iterations happens to work only on the data of a single partition, then you can also implement this iteration as part of a mapPartition operator. The only problem there would be that you have to keep all the partition's data on the heap, if you need access to it.

Cheers,

Till

On Tue, Apr 14, 2015 at 3:11 PM, Stephan Ewen <[hidden email]> wrote:
Hi Benoît!

You are right, the nested iterations are currently not supported.

The test you found actually checks that the Optimizer gives a good error message when encountering nested iterations.

Can you write your program as one iterations (the inner) and start the program multiple times to simulate the nesting?

Greetings,
Stephan


On Tue, Apr 14, 2015 at 8:11 AM, Benoît Hanotte <[hidden email]> wrote:
Hello,

I'm implementing an algorithm which requires nested iterations, and, from what I understood, this feature was not yet available in Flink [1], and my experiments with 2 nested bulk iterations seem to confirm that. However I came across a Flink unit test [2] using nested iterations, so I'm now a bit confused. Could you please give me some insight on what is supported or not in the current state of the project?

Thanks!

Benoît.

[1] http://mail-archives.apache.org/mod_mbox/flink-user/201411.mbox/%3CCANC1h_tpNFWwrnm14+Et1yBvkQPQ1-pfk=iUPP5Un90ZgUGeyg@...%3E
[2] https://github.com/apache/flink/blob/master/flink-optimizer/src/test/java/org/apache/flink/optimizer/NestedIterationsTest.java