(DEPRECATED) Apache Flink User Mailing List archive.

Nested Iterations Outlook

Classic

List

Threaded

11 messages Options

Maximilian Alber

Nested Iterations Outlook

Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.

More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,

Max

Maximilian Michels

Re: Nested Iterations Outlook

Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,

Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:

Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Alber

Re: Nested Iterations Outlook

Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,

Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:

Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Michels

Re: Nested Iterations Outlook

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,

Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:

Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Alber

Re: Nested Iterations Outlook

Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):

- change params

- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,

Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Michels

Re: Nested Iterations Outlook

Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration.

The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but some pending pull requests got delayed because priorities changed a bit.

Essentially, we need to merge these two pull requests:

https://github.com/apache/flink/pull/858

This introduces a session management which allows to keep the ExecutionGraph for the session.

https://github.com/apache/flink/pull/640

Implements the actual backtracking and caching of the results.

Once these are in, we can change the Java/Scala API to support backtracking. I don't exactly know how Spark's API does it but, essentially it should work then by just creating new operations on an existing DataSet and submit to the cluster again.

Cheers,

Max

On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <[hidden email]> wrote:

Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):
- change params
- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,
Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Alber

Re: Nested Iterations Outlook

Thanks!

Ok, cool. If I would like to test it, I just need to merge those two pull requests into my current branch?

Cheers,

Max

On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels <[hidden email]> wrote:

Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration.

The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but some pending pull requests got delayed because priorities changed a bit.

Essentially, we need to merge these two pull requests:

https://github.com/apache/flink/pull/858
This introduces a session management which allows to keep the ExecutionGraph for the session.

https://github.com/apache/flink/pull/640
Implements the actual backtracking and caching of the results.

Once these are in, we can change the Java/Scala API to support backtracking. I don't exactly know how Spark's API does it but, essentially it should work then by just creating new operations on an existing DataSet and submit to the cluster again.

Cheers,
Max

On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <[hidden email]> wrote:
Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):
- change params
- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,
Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Michels

Re: Nested Iterations Outlook

You could do that but you might run into merge conflicts. Also keep in mind that it is work in progress :)

On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber <[hidden email]> wrote:

Thanks!

Ok, cool. If I would like to test it, I just need to merge those two pull requests into my current branch?

Cheers,
Max

On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels <[hidden email]> wrote:
Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration.

The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but some pending pull requests got delayed because priorities changed a bit.

Essentially, we need to merge these two pull requests:

https://github.com/apache/flink/pull/858
This introduces a session management which allows to keep the ExecutionGraph for the session.

https://github.com/apache/flink/pull/640
Implements the actual backtracking and caching of the results.

Once these are in, we can change the Java/Scala API to support backtracking. I don't exactly know how Spark's API does it but, essentially it should work then by just creating new operations on an existing DataSet and submit to the cluster again.

Cheers,
Max

On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <[hidden email]> wrote:
Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):
- change params
- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,
Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Stephan Ewen

Re: Nested Iterations Outlook

The two pull requests do not go all the way, unfortunately. They cover only the runtime, the API integration part is missing still, unfortunately...

On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels <[hidden email]> wrote:

You could do that but you might run into merge conflicts. Also keep in mind that it is work in progress :)

On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber <[hidden email]> wrote:
Thanks!

Ok, cool. If I would like to test it, I just need to merge those two pull requests into my current branch?

Cheers,
Max

On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels <[hidden email]> wrote:
Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration.

The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but some pending pull requests got delayed because priorities changed a bit.

Essentially, we need to merge these two pull requests:

https://github.com/apache/flink/pull/858
This introduces a session management which allows to keep the ExecutionGraph for the session.

https://github.com/apache/flink/pull/640
Implements the actual backtracking and caching of the results.

Once these are in, we can change the Java/Scala API to support backtracking. I don't exactly know how Spark's API does it but, essentially it should work then by just creating new operations on an existing DataSet and submit to the cluster again.

Cheers,
Max

On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <[hidden email]> wrote:
Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):
- change params
- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,
Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Michels

Re: Nested Iterations Outlook

I mentioned that. @Max: you should only try it out if you want to experiment/work with the changes.

On Wed, Jul 22, 2015 at 2:20 PM, Stephan Ewen <[hidden email]> wrote:

The two pull requests do not go all the way, unfortunately. They cover only the runtime, the API integration part is missing still, unfortunately...

On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels <[hidden email]> wrote:
You could do that but you might run into merge conflicts. Also keep in mind that it is work in progress :)

On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber <[hidden email]> wrote:
Thanks!

Ok, cool. If I would like to test it, I just need to merge those two pull requests into my current branch?

Cheers,
Max

On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels <[hidden email]> wrote:
Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration.

The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but some pending pull requests got delayed because priorities changed a bit.

Essentially, we need to merge these two pull requests:

https://github.com/apache/flink/pull/858
This introduces a session management which allows to keep the ExecutionGraph for the session.

https://github.com/apache/flink/pull/640
Implements the actual backtracking and caching of the results.

Once these are in, we can change the Java/Scala API to support backtracking. I don't exactly know how Spark's API does it but, essentially it should work then by just creating new operations on an existing DataSet and submit to the cluster again.

Cheers,
Max

On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <[hidden email]> wrote:
Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):
- change params
- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,
Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max

Maximilian Alber

Re: Nested Iterations Outlook

Thanks.

Yes, I got that.

Cheers

On Wed, Jul 22, 2015 at 2:46 PM, Maximilian Michels <[hidden email]> wrote:

I mentioned that. @Max: you should only try it out if you want to experiment/work with the changes.

On Wed, Jul 22, 2015 at 2:20 PM, Stephan Ewen <[hidden email]> wrote:
The two pull requests do not go all the way, unfortunately. They cover only the runtime, the API integration part is missing still, unfortunately...

On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels <[hidden email]> wrote:
You could do that but you might run into merge conflicts. Also keep in mind that it is work in progress :)

On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber <[hidden email]> wrote:
Thanks!

Ok, cool. If I would like to test it, I just need to merge those two pull requests into my current branch?

Cheers,
Max

On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels <[hidden email]> wrote:
Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration.

The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but some pending pull requests got delayed because priorities changed a bit.

Essentially, we need to merge these two pull requests:

https://github.com/apache/flink/pull/858
This introduces a session management which allows to keep the ExecutionGraph for the session.

https://github.com/apache/flink/pull/640
Implements the actual backtracking and caching of the results.

Once these are in, we can change the Java/Scala API to support backtracking. I don't exactly know how Spark's API does it but, essentially it should work then by just creating new operations on an existing DataSet and submit to the cluster again.

Cheers,
Max

On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <[hidden email]> wrote:
Oh sorry, my fault. When I wrote it, I had iterations in mind.

What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this:

while(...):
- change params
- submit to cluster

where the executed Flink-program is more or less the same at each iterations. But with changing input sets, which are reused between different loop iterations.

I might got something wrong, because in our group we mentioned caching a lá Spark for Flink and someone came up that "pinning" will do that. Is that somewhat right?

Thanks and Cheers,
Max

On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <[hidden email]> wrote:
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet.

"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

I was referring to the existing support for intermediate results within iterations. If we were to implement nested iterations, this could (possibly) change. This is all very theoretical because there are no plans to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <[hidden email]> wrote:
Thanks for the answer! But I need some clarification:

"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does it have to do with that debate? :-)

Cheers,
Max

On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <[hidden email]> wrote:
Hi Max,

You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles. Same would be true for nested iterations. So the behavior for resuming from intermediate results should be alike for nested iterations.

Cheers,
Max

On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned. Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use that feature and program in "Spark" style? Or is there some other, more flexible, mechanism planned for loops?

Cheers,
Max