Forced to use Solution Set in Step Function

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Forced to use Solution Set in Step Function

Maximilian Alber
Hi Flinksters!

I would like to use iterateDelta function. I don't need the solution set inside the step function, because I generate a different values out of the working set. Unfortunately the compiler of the development version doesn't like that. Is there a workaround?

The code:

val residual_2a = residual_2 union env.fromCollection(Seq(Vector.zeros(config.dimensions)))
val emptyDataSet = env.fromCollection[Vector](Seq())
val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000, Array("id")) {
   (solutionset, workset) =>
   val old_sum = workset filter {_.id == -1}
   val current = workset filter (new RichFilterFunction[Vector]{
     def filter(x: Vector) = x.id == (getIterationRuntimeContext.getSuperstepNumber)
     })
   val residual_2 = workset filter {_.id != -1}
   val sum = VectorDataSet.add(old_sum, current)

   (sum map (new RichMapFunction[Vector, Vector]{
     def map(x: Vector) = new Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
    }),
   residual_2 union sum)
}

The error:

org.apache.flink.compiler.CompilerException: Error: The step function does not reference the solution set.
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
at org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
at org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
at org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
at org.apache.flink.api.common.Plan.accept(Plan.java:281)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
at org.apache.flink.client.program.Client.run(Client.java:285)
at org.apache.flink.client.program.Client.run(Client.java:230)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)

Thanks!
Cheers,
Max
Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Stephan Ewen
Hey!

Is the algorithm you are using a delta iteration in fact. If you actually do not use the solution set, can you model it as a bulk-iteration?

If you actually need the solution set to accumulate data, we can probably deactivate that check in the compiler. As far as I remember, there is no requirement in the runtime to join with the solution set. The check is meant to help programmers that forgot the join...

Greetings,
Stephan



On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters!

I would like to use iterateDelta function. I don't need the solution set inside the step function, because I generate a different values out of the working set. Unfortunately the compiler of the development version doesn't like that. Is there a workaround?

The code:

val residual_2a = residual_2 union env.fromCollection(Seq(Vector.zeros(config.dimensions)))
val emptyDataSet = env.fromCollection[Vector](Seq())
val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000, Array("id")) {
   (solutionset, workset) =>
   val old_sum = workset filter {_.id == -1}
   val current = workset filter (new RichFilterFunction[Vector]{
     def filter(x: Vector) = x.id == (getIterationRuntimeContext.getSuperstepNumber)
     })
   val residual_2 = workset filter {_.id != -1}
   val sum = VectorDataSet.add(old_sum, current)

   (sum map (new RichMapFunction[Vector, Vector]{
     def map(x: Vector) = new Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
    }),
   residual_2 union sum)
}

The error:

org.apache.flink.compiler.CompilerException: Error: The step function does not reference the solution set.
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
at org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
at org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
at org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
at org.apache.flink.api.common.Plan.accept(Plan.java:281)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
at org.apache.flink.client.program.Client.run(Client.java:285)
at org.apache.flink.client.program.Client.run(Client.java:230)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)

Thanks!
Cheers,
Max

Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
Hi!

Hmm I don't think so. I have two datasets, which I cannot really merge together. After some thinking this solution was the only I got for solving my problem:
I have a DataSet with Vector(in this case just with length one) each has an id and an array with values. Out of that I would like to create the prefix sums aka the cumulative sums. To do it I need the to keep the dataset with the vectors and the dataset where I store the sums.

In the Scala version I could use a dataset inside the iteration without passing as solution or workset just via closures?

Maybe a flag to disable the check would be suitable?

Thanks!
Cheers,
Max

On Tue, Oct 7, 2014 at 4:34 PM, Stephan Ewen <[hidden email]> wrote:
Hey!

Is the algorithm you are using a delta iteration in fact. If you actually do not use the solution set, can you model it as a bulk-iteration?

If you actually need the solution set to accumulate data, we can probably deactivate that check in the compiler. As far as I remember, there is no requirement in the runtime to join with the solution set. The check is meant to help programmers that forgot the join...

Greetings,
Stephan



On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters!

I would like to use iterateDelta function. I don't need the solution set inside the step function, because I generate a different values out of the working set. Unfortunately the compiler of the development version doesn't like that. Is there a workaround?

The code:

val residual_2a = residual_2 union env.fromCollection(Seq(Vector.zeros(config.dimensions)))
val emptyDataSet = env.fromCollection[Vector](Seq())
val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000, Array("id")) {
   (solutionset, workset) =>
   val old_sum = workset filter {_.id == -1}
   val current = workset filter (new RichFilterFunction[Vector]{
     def filter(x: Vector) = x.id == (getIterationRuntimeContext.getSuperstepNumber)
     })
   val residual_2 = workset filter {_.id != -1}
   val sum = VectorDataSet.add(old_sum, current)

   (sum map (new RichMapFunction[Vector, Vector]{
     def map(x: Vector) = new Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
    }),
   residual_2 union sum)
}

The error:

org.apache.flink.compiler.CompilerException: Error: The step function does not reference the solution set.
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
at org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
at org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
at org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
at org.apache.flink.api.common.Plan.accept(Plan.java:281)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
at org.apache.flink.client.program.Client.run(Client.java:285)
at org.apache.flink.client.program.Client.run(Client.java:230)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)

Thanks!
Cheers,
Max


Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Aljoscha Krettek
Hmm, what it really needs is a different kind of iteration primitive.
Basically a bulk iteration where you can output values in each
iteration that get collected.

On Wed, Oct 8, 2014 at 10:02 AM, Maximilian Alber
<[hidden email]> wrote:

> Hi!
>
> Hmm I don't think so. I have two datasets, which I cannot really merge
> together. After some thinking this solution was the only I got for solving
> my problem:
> I have a DataSet with Vector(in this case just with length one) each has an
> id and an array with values. Out of that I would like to create the prefix
> sums aka the cumulative sums. To do it I need the to keep the dataset with
> the vectors and the dataset where I store the sums.
>
> In the Scala version I could use a dataset inside the iteration without
> passing as solution or workset just via closures?
>
> Maybe a flag to disable the check would be suitable?
>
> Thanks!
> Cheers,
> Max
>
> On Tue, Oct 7, 2014 at 4:34 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hey!
>>
>> Is the algorithm you are using a delta iteration in fact. If you actually
>> do not use the solution set, can you model it as a bulk-iteration?
>>
>> If you actually need the solution set to accumulate data, we can probably
>> deactivate that check in the compiler. As far as I remember, there is no
>> requirement in the runtime to join with the solution set. The check is meant
>> to help programmers that forgot the join...
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber
>> <[hidden email]> wrote:
>>>
>>> Hi Flinksters!
>>>
>>> I would like to use iterateDelta function. I don't need the solution set
>>> inside the step function, because I generate a different values out of the
>>> working set. Unfortunately the compiler of the development version doesn't
>>> like that. Is there a workaround?
>>>
>>> The code:
>>>
>>> val residual_2a = residual_2 union
>>> env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>>> val emptyDataSet = env.fromCollection[Vector](Seq())
>>> val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000, Array("id"))
>>> {
>>>    (solutionset, workset) =>
>>>    val old_sum = workset filter {_.id == -1}
>>>    val current = workset filter (new RichFilterFunction[Vector]{
>>>      def filter(x: Vector) = x.id ==
>>> (getIterationRuntimeContext.getSuperstepNumber)
>>>      })
>>>    val residual_2 = workset filter {_.id != -1}
>>>    val sum = VectorDataSet.add(old_sum, current)
>>>
>>>    (sum map (new RichMapFunction[Vector, Vector]{
>>>      def map(x: Vector) = new
>>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>>>     }),
>>>    residual_2 union sum)
>>> }
>>>
>>> The error:
>>>
>>> org.apache.flink.compiler.CompilerException: Error: The step function
>>> does not reference the solution set.
>>> at
>>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
>>> at
>>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
>>> at
>>> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>>> at
>>> org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
>>> at
>>> org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
>>> at org.apache.flink.api.common.Plan.accept(Plan.java:281)
>>> at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
>>> at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
>>> at
>>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
>>> at
>>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
>>> at org.apache.flink.client.program.Client.run(Client.java:285)
>>> at org.apache.flink.client.program.Client.run(Client.java:230)
>>> at
>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
>>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
>>> at
>>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
>>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>>>
>>> Thanks!
>>> Cheers,
>>> Max
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
Just got into my mind: it possible to have broadcast sets inside the iteration functions with datasets which are "located" outside of it (via closure)?

The basic type of my iteration is that I have a datasets which gets altered and is needed each iterations aka working set, in my case I have also a constant dataset which gets not modified (that messes up the code) and a resulting dataset which is not needed inside the step function. 
Thus similar to iterate with delta.

Cheers,
Max


On Wed, Oct 8, 2014 at 10:26 AM, Aljoscha Krettek <[hidden email]> wrote:
Hmm, what it really needs is a different kind of iteration primitive.
Basically a bulk iteration where you can output values in each
iteration that get collected.

On Wed, Oct 8, 2014 at 10:02 AM, Maximilian Alber
<[hidden email]> wrote:
> Hi!
>
> Hmm I don't think so. I have two datasets, which I cannot really merge
> together. After some thinking this solution was the only I got for solving
> my problem:
> I have a DataSet with Vector(in this case just with length one) each has an
> id and an array with values. Out of that I would like to create the prefix
> sums aka the cumulative sums. To do it I need the to keep the dataset with
> the vectors and the dataset where I store the sums.
>
> In the Scala version I could use a dataset inside the iteration without
> passing as solution or workset just via closures?
>
> Maybe a flag to disable the check would be suitable?
>
> Thanks!
> Cheers,
> Max
>
> On Tue, Oct 7, 2014 at 4:34 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hey!
>>
>> Is the algorithm you are using a delta iteration in fact. If you actually
>> do not use the solution set, can you model it as a bulk-iteration?
>>
>> If you actually need the solution set to accumulate data, we can probably
>> deactivate that check in the compiler. As far as I remember, there is no
>> requirement in the runtime to join with the solution set. The check is meant
>> to help programmers that forgot the join...
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber
>> <[hidden email]> wrote:
>>>
>>> Hi Flinksters!
>>>
>>> I would like to use iterateDelta function. I don't need the solution set
>>> inside the step function, because I generate a different values out of the
>>> working set. Unfortunately the compiler of the development version doesn't
>>> like that. Is there a workaround?
>>>
>>> The code:
>>>
>>> val residual_2a = residual_2 union
>>> env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>>> val emptyDataSet = env.fromCollection[Vector](Seq())
>>> val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000, Array("id"))
>>> {
>>>    (solutionset, workset) =>
>>>    val old_sum = workset filter {_.id == -1}
>>>    val current = workset filter (new RichFilterFunction[Vector]{
>>>      def filter(x: Vector) = x.id ==
>>> (getIterationRuntimeContext.getSuperstepNumber)
>>>      })
>>>    val residual_2 = workset filter {_.id != -1}
>>>    val sum = VectorDataSet.add(old_sum, current)
>>>
>>>    (sum map (new RichMapFunction[Vector, Vector]{
>>>      def map(x: Vector) = new
>>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>>>     }),
>>>    residual_2 union sum)
>>> }
>>>
>>> The error:
>>>
>>> org.apache.flink.compiler.CompilerException: Error: The step function
>>> does not reference the solution set.
>>> at
>>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
>>> at
>>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
>>> at
>>> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>>> at
>>> org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
>>> at
>>> org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
>>> at org.apache.flink.api.common.Plan.accept(Plan.java:281)
>>> at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
>>> at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
>>> at
>>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
>>> at
>>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
>>> at org.apache.flink.client.program.Client.run(Client.java:285)
>>> at org.apache.flink.client.program.Client.run(Client.java:230)
>>> at
>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
>>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
>>> at
>>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
>>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>>>
>>> Thanks!
>>> Cheers,
>>> Max
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Aljoscha Krettek
Yes, you can refer to outside datasets in an iteration.

On Wed, Oct 8, 2014 at 10:37 AM, Maximilian Alber
<[hidden email]> wrote:

> Just got into my mind: it possible to have broadcast sets inside the
> iteration functions with datasets which are "located" outside of it (via
> closure)?
>
> The basic type of my iteration is that I have a datasets which gets altered
> and is needed each iterations aka working set, in my case I have also a
> constant dataset which gets not modified (that messes up the code) and a
> resulting dataset which is not needed inside the step function.
> Thus similar to iterate with delta.
>
> Cheers,
> Max
>
>
> On Wed, Oct 8, 2014 at 10:26 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>>
>> Hmm, what it really needs is a different kind of iteration primitive.
>> Basically a bulk iteration where you can output values in each
>> iteration that get collected.
>>
>> On Wed, Oct 8, 2014 at 10:02 AM, Maximilian Alber
>> <[hidden email]> wrote:
>> > Hi!
>> >
>> > Hmm I don't think so. I have two datasets, which I cannot really merge
>> > together. After some thinking this solution was the only I got for
>> > solving
>> > my problem:
>> > I have a DataSet with Vector(in this case just with length one) each has
>> > an
>> > id and an array with values. Out of that I would like to create the
>> > prefix
>> > sums aka the cumulative sums. To do it I need the to keep the dataset
>> > with
>> > the vectors and the dataset where I store the sums.
>> >
>> > In the Scala version I could use a dataset inside the iteration without
>> > passing as solution or workset just via closures?
>> >
>> > Maybe a flag to disable the check would be suitable?
>> >
>> > Thanks!
>> > Cheers,
>> > Max
>> >
>> > On Tue, Oct 7, 2014 at 4:34 PM, Stephan Ewen <[hidden email]> wrote:
>> >>
>> >> Hey!
>> >>
>> >> Is the algorithm you are using a delta iteration in fact. If you
>> >> actually
>> >> do not use the solution set, can you model it as a bulk-iteration?
>> >>
>> >> If you actually need the solution set to accumulate data, we can
>> >> probably
>> >> deactivate that check in the compiler. As far as I remember, there is
>> >> no
>> >> requirement in the runtime to join with the solution set. The check is
>> >> meant
>> >> to help programmers that forgot the join...
>> >>
>> >> Greetings,
>> >> Stephan
>> >>
>> >>
>> >>
>> >> On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Hi Flinksters!
>> >>>
>> >>> I would like to use iterateDelta function. I don't need the solution
>> >>> set
>> >>> inside the step function, because I generate a different values out of
>> >>> the
>> >>> working set. Unfortunately the compiler of the development version
>> >>> doesn't
>> >>> like that. Is there a workaround?
>> >>>
>> >>> The code:
>> >>>
>> >>> val residual_2a = residual_2 union
>> >>> env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> >>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> >>> val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000,
>> >>> Array("id"))
>> >>> {
>> >>>    (solutionset, workset) =>
>> >>>    val old_sum = workset filter {_.id == -1}
>> >>>    val current = workset filter (new RichFilterFunction[Vector]{
>> >>>      def filter(x: Vector) = x.id ==
>> >>> (getIterationRuntimeContext.getSuperstepNumber)
>> >>>      })
>> >>>    val residual_2 = workset filter {_.id != -1}
>> >>>    val sum = VectorDataSet.add(old_sum, current)
>> >>>
>> >>>    (sum map (new RichMapFunction[Vector, Vector]{
>> >>>      def map(x: Vector) = new
>> >>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>> >>>     }),
>> >>>    residual_2 union sum)
>> >>> }
>> >>>
>> >>> The error:
>> >>>
>> >>> org.apache.flink.compiler.CompilerException: Error: The step function
>> >>> does not reference the solution set.
>> >>> at
>> >>>
>> >>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
>> >>> at
>> >>>
>> >>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
>> >>> at
>> >>>
>> >>> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>> >>> at
>> >>>
>> >>> org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
>> >>> at
>> >>>
>> >>> org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
>> >>> at org.apache.flink.api.common.Plan.accept(Plan.java:281)
>> >>> at
>> >>> org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
>> >>> at
>> >>> org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
>> >>> at org.apache.flink.client.program.Client.run(Client.java:285)
>> >>> at org.apache.flink.client.program.Client.run(Client.java:230)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
>> >>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
>> >>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>> >>>
>> >>> Thanks!
>> >>> Cheers,
>> >>> Max
>> >>
>> >>
>> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
But no changing?

On Wed, Oct 8, 2014 at 11:39 AM, Aljoscha Krettek <[hidden email]> wrote:
Yes, you can refer to outside datasets in an iteration.

On Wed, Oct 8, 2014 at 10:37 AM, Maximilian Alber
<[hidden email]> wrote:
> Just got into my mind: it possible to have broadcast sets inside the
> iteration functions with datasets which are "located" outside of it (via
> closure)?
>
> The basic type of my iteration is that I have a datasets which gets altered
> and is needed each iterations aka working set, in my case I have also a
> constant dataset which gets not modified (that messes up the code) and a
> resulting dataset which is not needed inside the step function.
> Thus similar to iterate with delta.
>
> Cheers,
> Max
>
>
> On Wed, Oct 8, 2014 at 10:26 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>>
>> Hmm, what it really needs is a different kind of iteration primitive.
>> Basically a bulk iteration where you can output values in each
>> iteration that get collected.
>>
>> On Wed, Oct 8, 2014 at 10:02 AM, Maximilian Alber
>> <[hidden email]> wrote:
>> > Hi!
>> >
>> > Hmm I don't think so. I have two datasets, which I cannot really merge
>> > together. After some thinking this solution was the only I got for
>> > solving
>> > my problem:
>> > I have a DataSet with Vector(in this case just with length one) each has
>> > an
>> > id and an array with values. Out of that I would like to create the
>> > prefix
>> > sums aka the cumulative sums. To do it I need the to keep the dataset
>> > with
>> > the vectors and the dataset where I store the sums.
>> >
>> > In the Scala version I could use a dataset inside the iteration without
>> > passing as solution or workset just via closures?
>> >
>> > Maybe a flag to disable the check would be suitable?
>> >
>> > Thanks!
>> > Cheers,
>> > Max
>> >
>> > On Tue, Oct 7, 2014 at 4:34 PM, Stephan Ewen <[hidden email]> wrote:
>> >>
>> >> Hey!
>> >>
>> >> Is the algorithm you are using a delta iteration in fact. If you
>> >> actually
>> >> do not use the solution set, can you model it as a bulk-iteration?
>> >>
>> >> If you actually need the solution set to accumulate data, we can
>> >> probably
>> >> deactivate that check in the compiler. As far as I remember, there is
>> >> no
>> >> requirement in the runtime to join with the solution set. The check is
>> >> meant
>> >> to help programmers that forgot the join...
>> >>
>> >> Greetings,
>> >> Stephan
>> >>
>> >>
>> >>
>> >> On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Hi Flinksters!
>> >>>
>> >>> I would like to use iterateDelta function. I don't need the solution
>> >>> set
>> >>> inside the step function, because I generate a different values out of
>> >>> the
>> >>> working set. Unfortunately the compiler of the development version
>> >>> doesn't
>> >>> like that. Is there a workaround?
>> >>>
>> >>> The code:
>> >>>
>> >>> val residual_2a = residual_2 union
>> >>> env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> >>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> >>> val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000,
>> >>> Array("id"))
>> >>> {
>> >>>    (solutionset, workset) =>
>> >>>    val old_sum = workset filter {_.id == -1}
>> >>>    val current = workset filter (new RichFilterFunction[Vector]{
>> >>>      def filter(x: Vector) = x.id ==
>> >>> (getIterationRuntimeContext.getSuperstepNumber)
>> >>>      })
>> >>>    val residual_2 = workset filter {_.id != -1}
>> >>>    val sum = VectorDataSet.add(old_sum, current)
>> >>>
>> >>>    (sum map (new RichMapFunction[Vector, Vector]{
>> >>>      def map(x: Vector) = new
>> >>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>> >>>     }),
>> >>>    residual_2 union sum)
>> >>> }
>> >>>
>> >>> The error:
>> >>>
>> >>> org.apache.flink.compiler.CompilerException: Error: The step function
>> >>> does not reference the solution set.
>> >>> at
>> >>>
>> >>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
>> >>> at
>> >>>
>> >>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
>> >>> at
>> >>>
>> >>> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>> >>> at
>> >>>
>> >>> org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
>> >>> at
>> >>>
>> >>> org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
>> >>> at org.apache.flink.api.common.Plan.accept(Plan.java:281)
>> >>> at
>> >>> org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
>> >>> at
>> >>> org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
>> >>> at org.apache.flink.client.program.Client.run(Client.java:285)
>> >>> at org.apache.flink.client.program.Client.run(Client.java:230)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
>> >>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
>> >>> at
>> >>>
>> >>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
>> >>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>> >>>
>> >>> Thanks!
>> >>> Cheers,
>> >>> Max
>> >>
>> >>
>> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Aljoscha Krettek
Correct, no changing. DataSets can never be modified, in fact. (Except
the solution set in a delta iteration, but there the user is also not
explicitly changing a DataSet.)

On Wed, Oct 8, 2014 at 12:56 PM, Maximilian Alber
<[hidden email]> wrote:

> But no changing?
>
> On Wed, Oct 8, 2014 at 11:39 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>>
>> Yes, you can refer to outside datasets in an iteration.
>>
>> On Wed, Oct 8, 2014 at 10:37 AM, Maximilian Alber
>> <[hidden email]> wrote:
>> > Just got into my mind: it possible to have broadcast sets inside the
>> > iteration functions with datasets which are "located" outside of it (via
>> > closure)?
>> >
>> > The basic type of my iteration is that I have a datasets which gets
>> > altered
>> > and is needed each iterations aka working set, in my case I have also a
>> > constant dataset which gets not modified (that messes up the code) and a
>> > resulting dataset which is not needed inside the step function.
>> > Thus similar to iterate with delta.
>> >
>> > Cheers,
>> > Max
>> >
>> >
>> > On Wed, Oct 8, 2014 at 10:26 AM, Aljoscha Krettek <[hidden email]>
>> > wrote:
>> >>
>> >> Hmm, what it really needs is a different kind of iteration primitive.
>> >> Basically a bulk iteration where you can output values in each
>> >> iteration that get collected.
>> >>
>> >> On Wed, Oct 8, 2014 at 10:02 AM, Maximilian Alber
>> >> <[hidden email]> wrote:
>> >> > Hi!
>> >> >
>> >> > Hmm I don't think so. I have two datasets, which I cannot really
>> >> > merge
>> >> > together. After some thinking this solution was the only I got for
>> >> > solving
>> >> > my problem:
>> >> > I have a DataSet with Vector(in this case just with length one) each
>> >> > has
>> >> > an
>> >> > id and an array with values. Out of that I would like to create the
>> >> > prefix
>> >> > sums aka the cumulative sums. To do it I need the to keep the dataset
>> >> > with
>> >> > the vectors and the dataset where I store the sums.
>> >> >
>> >> > In the Scala version I could use a dataset inside the iteration
>> >> > without
>> >> > passing as solution or workset just via closures?
>> >> >
>> >> > Maybe a flag to disable the check would be suitable?
>> >> >
>> >> > Thanks!
>> >> > Cheers,
>> >> > Max
>> >> >
>> >> > On Tue, Oct 7, 2014 at 4:34 PM, Stephan Ewen <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> Hey!
>> >> >>
>> >> >> Is the algorithm you are using a delta iteration in fact. If you
>> >> >> actually
>> >> >> do not use the solution set, can you model it as a bulk-iteration?
>> >> >>
>> >> >> If you actually need the solution set to accumulate data, we can
>> >> >> probably
>> >> >> deactivate that check in the compiler. As far as I remember, there
>> >> >> is
>> >> >> no
>> >> >> requirement in the runtime to join with the solution set. The check
>> >> >> is
>> >> >> meant
>> >> >> to help programmers that forgot the join...
>> >> >>
>> >> >> Greetings,
>> >> >> Stephan
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Oct 7, 2014 at 3:13 PM, Maximilian Alber
>> >> >> <[hidden email]> wrote:
>> >> >>>
>> >> >>> Hi Flinksters!
>> >> >>>
>> >> >>> I would like to use iterateDelta function. I don't need the
>> >> >>> solution
>> >> >>> set
>> >> >>> inside the step function, because I generate a different values out
>> >> >>> of
>> >> >>> the
>> >> >>> working set. Unfortunately the compiler of the development version
>> >> >>> doesn't
>> >> >>> like that. Is there a workaround?
>> >> >>>
>> >> >>> The code:
>> >> >>>
>> >> >>> val residual_2a = residual_2 union
>> >> >>> env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> >> >>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> >> >>> val cumSum = emptyDataSet.iterateDelta(residual_2a, 1000000,
>> >> >>> Array("id"))
>> >> >>> {
>> >> >>>    (solutionset, workset) =>
>> >> >>>    val old_sum = workset filter {_.id == -1}
>> >> >>>    val current = workset filter (new RichFilterFunction[Vector]{
>> >> >>>      def filter(x: Vector) = x.id ==
>> >> >>> (getIterationRuntimeContext.getSuperstepNumber)
>> >> >>>      })
>> >> >>>    val residual_2 = workset filter {_.id != -1}
>> >> >>>    val sum = VectorDataSet.add(old_sum, current)
>> >> >>>
>> >> >>>    (sum map (new RichMapFunction[Vector, Vector]{
>> >> >>>      def map(x: Vector) = new
>> >> >>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>> >> >>>     }),
>> >> >>>    residual_2 union sum)
>> >> >>> }
>> >> >>>
>> >> >>> The error:
>> >> >>>
>> >> >>> org.apache.flink.compiler.CompilerException: Error: The step
>> >> >>> function
>> >> >>> does not reference the solution set.
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:868)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
>> >> >>> at org.apache.flink.api.common.Plan.accept(Plan.java:281)
>> >> >>> at
>> >> >>>
>> >> >>> org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
>> >> >>> at
>> >> >>>
>> >> >>> org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
>> >> >>> at org.apache.flink.client.program.Client.run(Client.java:285)
>> >> >>> at org.apache.flink.client.program.Client.run(Client.java:230)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
>> >> >>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
>> >> >>> at
>> >> >>>
>> >> >>>
>> >> >>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
>> >> >>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>> >> >>>
>> >> >>> Thanks!
>> >> >>> Cheers,
>> >> >>> Max
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Stephan Ewen

I think we can disable the check. Can you open an issue for that?

As a quick fix, is it possible for you to compile yourself a version where the check is commented out?

Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
I will tomorrow.
I will try. Thanks.

Cheers
Max

On Wed, Oct 8, 2014 at 2:52 PM, Stephan Ewen <[hidden email]> wrote:

I think we can disable the check. Can you open an issue for that?

As a quick fix, is it possible for you to compile yourself a version where the check is commented out?


Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
I opend an issue.

After compiling with the commented check (actually the whole block, line 867-900) I get a NullPointerException:

java.lang.NullPointerException
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:870)
at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:622)
at org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
at org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:202)
at org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:286)
at org.apache.flink.api.common.Plan.accept(Plan.java:281)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:517)
at org.apache.flink.compiler.PactCompiler.compile(PactCompiler.java:466)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:196)
at org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:209)
at org.apache.flink.client.program.Client.run(Client.java:285)
at org.apache.flink.client.program.Client.run(Client.java:230)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)

On Wed, Oct 8, 2014 at 5:11 PM, Maximilian Alber <[hidden email]> wrote:
I will tomorrow.
I will try. Thanks.

Cheers
Max

On Wed, Oct 8, 2014 at 2:52 PM, Stephan Ewen <[hidden email]> wrote:

I think we can disable the check. Can you open an issue for that?

As a quick fix, is it possible for you to compile yourself a version where the check is commented out?



Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Stephan Ewen

Thank you, I will look into that...

Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
Ok, thanks.
Please let me know when it is fixed.

Cheers
Max

On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:

Thank you, I will look into that...


Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
Should work now.
Cheers

On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber <[hidden email]> wrote:
Ok, thanks.
Please let me know when it is fixed.

Cheers
Max

On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:

Thank you, I will look into that...



Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
Hmm or maybe not. With this code I get some strange error:

def createPlan_find_center(env: ExecutionEnvironment) = {
val X = env readTextFile config.xFile map {Vector.parseFromString(config.dimensions, _)};
val residual = env readTextFile config.yFile map {Vector.parseFromString(_)};
val randoms = env readTextFile config.randomFile map {Vector.parseFromString(_)}

val residual_2 = residual * residual
val ys = (residual_2 sumV) * (randoms filter {_.id == 0})

val emptyDataSet = env.fromCollection[Vector](Seq())
val sumVector = env.fromCollection(Seq(Vector.zeros(config.dimensions)))
val cumSum = emptyDataSet.iterateDelta(sumVector, config.N, Array("id")) {
    (solutionset, old_sum) =>
    val current = residual_2 filter (new RichFilterFunction[Vector]{
      def filter(x: Vector) = x.id == (getIterationRuntimeContext.getSuperstepNumber)
    })
    val sum = VectorDataSet.add(old_sum, current)

    (sum map (new RichMapFunction[Vector, Vector]{
      def map(x: Vector) = new Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
    }),
    sum)
}

Error:
10/14/2014 15:57:35: Job execution switched to status RUNNING
10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: DataSource ([]) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: DataSource ([]) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to RUNNING
10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
10/14/2014 15:57:35: DataSource ([]) (1/1) switched to RUNNING
10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to RUNNING
10/14/2014 15:57:35: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to DEPLOYING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to RUNNING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to FAILED
java.lang.IllegalStateException: This stub is not part of an iteration step function.
at org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
at org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
at org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)

10/14/2014 15:57:36: Job execution switched to status FAILING
10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELING
10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELING
10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to CANCELING
10/14/2014 15:57:36: Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
10/14/2014 15:57:36: DataSink(TextOutputFormat (/tmp/tmplSYJ7S) - UTF-8) (1/1) switched to CANCELED
10/14/2014 15:57:36: Sync (WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to CANCELED
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
10/14/2014 15:57:36: Map (bumpboost.BumpBoost$$anonfun$8$$anon$2) (1/1) switched to CANCELED
10/14/2014 15:57:36: SolutionSet Delta (1/1) switched to CANCELED
10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELED
10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELED
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to CANCELED
10/14/2014 15:57:36: Job execution switched to status FAILED
Error: The program execution failed: java.lang.IllegalStateException: This stub is not part of an iteration step function.
at org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
at org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
at org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)

On Tue, Oct 14, 2014 at 3:32 PM, Maximilian Alber <[hidden email]> wrote:
Should work now.
Cheers

On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber <[hidden email]> wrote:
Ok, thanks.
Please let me know when it is fixed.

Cheers
Max

On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:

Thank you, I will look into that...




Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Fabian Hueske
Hi,

I'm not super familiar with the iterations, but my guess would be that the filter is not evaluated as part of the iteration.
Since it is not connect to the workset, the filter is not part of the loop and evaluated once outside where no superset number is available.
I guess, moving the filter outside of the loop gives the same error.

Cheers, Fabian



2014-10-14 16:18 GMT+02:00 Maximilian Alber <[hidden email]>:
Hmm or maybe not. With this code I get some strange error:

def createPlan_find_center(env: ExecutionEnvironment) = {
val X = env readTextFile config.xFile map {Vector.parseFromString(config.dimensions, _)};
val residual = env readTextFile config.yFile map {Vector.parseFromString(_)};
val randoms = env readTextFile config.randomFile map {Vector.parseFromString(_)}

val residual_2 = residual * residual
val ys = (residual_2 sumV) * (randoms filter {_.id == 0})

val emptyDataSet = env.fromCollection[Vector](Seq())
val sumVector = env.fromCollection(Seq(Vector.zeros(config.dimensions)))
val cumSum = emptyDataSet.iterateDelta(sumVector, config.N, Array("id")) {
    (solutionset, old_sum) =>
    val current = residual_2 filter (new RichFilterFunction[Vector]{
      def filter(x: Vector) = x.id == (getIterationRuntimeContext.getSuperstepNumber)
    })
    val sum = VectorDataSet.add(old_sum, current)

    (sum map (new RichMapFunction[Vector, Vector]{
      def map(x: Vector) = new Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
    }),
    sum)
}

Error:
10/14/2014 15:57:35: Job execution switched to status RUNNING
10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: DataSource ([]) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: DataSource ([]) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to RUNNING
10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to DEPLOYING
10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
10/14/2014 15:57:35: DataSource ([]) (1/1) switched to RUNNING
10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to RUNNING
10/14/2014 15:57:35: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to SCHEDULED
10/14/2014 15:57:35: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to DEPLOYING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to RUNNING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to FAILED
java.lang.IllegalStateException: This stub is not part of an iteration step function.
at org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
at org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
at org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)

10/14/2014 15:57:36: Job execution switched to status FAILING
10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELING
10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELING
10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to CANCELING
10/14/2014 15:57:36: Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
10/14/2014 15:57:36: DataSink(TextOutputFormat (/tmp/tmplSYJ7S) - UTF-8) (1/1) switched to CANCELED
10/14/2014 15:57:36: Sync (WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to CANCELED
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
10/14/2014 15:57:36: Map (bumpboost.BumpBoost$$anonfun$8$$anon$2) (1/1) switched to CANCELED
10/14/2014 15:57:36: SolutionSet Delta (1/1) switched to CANCELED
10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELED
10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) - UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELED
10/14/2014 15:57:36: CHAIN Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta Iteration)) (1/1) switched to CANCELED
10/14/2014 15:57:36: Job execution switched to status FAILED
Error: The program execution failed: java.lang.IllegalStateException: This stub is not part of an iteration step function.
at org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
at org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
at org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
at org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
at java.lang.Thread.run(Thread.java:745)

On Tue, Oct 14, 2014 at 3:32 PM, Maximilian Alber <[hidden email]> wrote:
Should work now.
Cheers

On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber <[hidden email]> wrote:
Ok, thanks.
Please let me know when it is fixed.

Cheers
Max

On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:

Thank you, I will look into that...





Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Aljoscha Krettek
Dammit you beat me to it. But yes, this is exactly what I was just writing.

On Tue, Oct 14, 2014 at 4:35 PM, Fabian Hueske <[hidden email]> wrote:

> Hi,
>
> I'm not super familiar with the iterations, but my guess would be that the
> filter is not evaluated as part of the iteration.
> Since it is not connect to the workset, the filter is not part of the loop
> and evaluated once outside where no superset number is available.
> I guess, moving the filter outside of the loop gives the same error.
>
> Cheers, Fabian
>
>
>
> 2014-10-14 16:18 GMT+02:00 Maximilian Alber <[hidden email]>:
>>
>> Hmm or maybe not. With this code I get some strange error:
>>
>> def createPlan_find_center(env: ExecutionEnvironment) = {
>> val X = env readTextFile config.xFile map
>> {Vector.parseFromString(config.dimensions, _)};
>> val residual = env readTextFile config.yFile map
>> {Vector.parseFromString(_)};
>> val randoms = env readTextFile config.randomFile map
>> {Vector.parseFromString(_)}
>>
>> val residual_2 = residual * residual
>> val ys = (residual_2 sumV) * (randoms filter {_.id == 0})
>>
>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> val sumVector = env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> val cumSum = emptyDataSet.iterateDelta(sumVector, config.N, Array("id")) {
>>     (solutionset, old_sum) =>
>>     val current = residual_2 filter (new RichFilterFunction[Vector]{
>>       def filter(x: Vector) = x.id ==
>> (getIterationRuntimeContext.getSuperstepNumber)
>>     })
>>     val sum = VectorDataSet.add(old_sum, current)
>>
>>     (sum map (new RichMapFunction[Vector, Vector]{
>>       def map(x: Vector) = new
>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>>     }),
>>     sum)
>> }
>>
>> Error:
>> 10/14/2014 15:57:35: Job execution switched to status RUNNING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> SCHEDULED
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> RUNNING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to FAILED
>> java.lang.IllegalStateException: This stub is not part of an iteration
>> step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> 10/14/2014 15:57:36: Job execution switched to status FAILING
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELING
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (org.apache.flink.api.scala.DataSet$$anon$1)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSink(TextOutputFormat (/tmp/tmplSYJ7S) - UTF-8)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Sync (WorksetIteration (Unnamed Delta Iteration))
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (bumpboost.BumpBoost$$anonfun$8$$anon$2) (1/1)
>> switched to CANCELED
>> 10/14/2014 15:57:36: SolutionSet Delta (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELED
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Job execution switched to status FAILED
>> Error: The program execution failed: java.lang.IllegalStateException: This
>> stub is not part of an iteration step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Tue, Oct 14, 2014 at 3:32 PM, Maximilian Alber
>> <[hidden email]> wrote:
>>>
>>> Should work now.
>>> Cheers
>>>
>>> On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber
>>> <[hidden email]> wrote:
>>>>
>>>> Ok, thanks.
>>>> Please let me know when it is fixed.
>>>>
>>>> Cheers
>>>> Max
>>>>
>>>> On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:
>>>>>
>>>>> Thank you, I will look into that...
>>>>
>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
Ok, sounds true, but somehow I would like to execute it inside of it. So I probably need to do some nonsense work to make it part of it?

On Tue, Oct 14, 2014 at 4:36 PM, Aljoscha Krettek <[hidden email]> wrote:
Dammit you beat me to it. But yes, this is exactly what I was just writing.

On Tue, Oct 14, 2014 at 4:35 PM, Fabian Hueske <[hidden email]> wrote:
> Hi,
>
> I'm not super familiar with the iterations, but my guess would be that the
> filter is not evaluated as part of the iteration.
> Since it is not connect to the workset, the filter is not part of the loop
> and evaluated once outside where no superset number is available.
> I guess, moving the filter outside of the loop gives the same error.
>
> Cheers, Fabian
>
>
>
> 2014-10-14 16:18 GMT+02:00 Maximilian Alber <[hidden email]>:
>>
>> Hmm or maybe not. With this code I get some strange error:
>>
>> def createPlan_find_center(env: ExecutionEnvironment) = {
>> val X = env readTextFile config.xFile map
>> {Vector.parseFromString(config.dimensions, _)};
>> val residual = env readTextFile config.yFile map
>> {Vector.parseFromString(_)};
>> val randoms = env readTextFile config.randomFile map
>> {Vector.parseFromString(_)}
>>
>> val residual_2 = residual * residual
>> val ys = (residual_2 sumV) * (randoms filter {_.id == 0})
>>
>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> val sumVector = env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> val cumSum = emptyDataSet.iterateDelta(sumVector, config.N, Array("id")) {
>>     (solutionset, old_sum) =>
>>     val current = residual_2 filter (new RichFilterFunction[Vector]{
>>       def filter(x: Vector) = x.id ==
>> (getIterationRuntimeContext.getSuperstepNumber)
>>     })
>>     val sum = VectorDataSet.add(old_sum, current)
>>
>>     (sum map (new RichMapFunction[Vector, Vector]{
>>       def map(x: Vector) = new
>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>>     }),
>>     sum)
>> }
>>
>> Error:
>> 10/14/2014 15:57:35: Job execution switched to status RUNNING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> SCHEDULED
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> RUNNING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to FAILED
>> java.lang.IllegalStateException: This stub is not part of an iteration
>> step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> 10/14/2014 15:57:36: Job execution switched to status FAILING
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELING
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (org.apache.flink.api.scala.DataSet$$anon$1)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSink(TextOutputFormat (/tmp/tmplSYJ7S) - UTF-8)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Sync (WorksetIteration (Unnamed Delta Iteration))
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (bumpboost.BumpBoost$$anonfun$8$$anon$2) (1/1)
>> switched to CANCELED
>> 10/14/2014 15:57:36: SolutionSet Delta (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELED
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Job execution switched to status FAILED
>> Error: The program execution failed: java.lang.IllegalStateException: This
>> stub is not part of an iteration step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Tue, Oct 14, 2014 at 3:32 PM, Maximilian Alber
>> <[hidden email]> wrote:
>>>
>>> Should work now.
>>> Cheers
>>>
>>> On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber
>>> <[hidden email]> wrote:
>>>>
>>>> Ok, thanks.
>>>> Please let me know when it is fixed.
>>>>
>>>> Cheers
>>>> Max
>>>>
>>>> On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:
>>>>>
>>>>> Thank you, I will look into that...
>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Fabian Hueske
Jep, I see you point.
Conceptually, all data that changes and affects the result of an iteration should be part of the workset.
Hence, the model kind of assumes that the datum "superStepNumber" should be part of the workset.

I am not familiar with your application, but would it make sense to add the number as an additional attribute to the workset data set and increase it manually?

2014-10-14 16:45 GMT+02:00 Maximilian Alber <[hidden email]>:
Ok, sounds true, but somehow I would like to execute it inside of it. So I probably need to do some nonsense work to make it part of it?

On Tue, Oct 14, 2014 at 4:36 PM, Aljoscha Krettek <[hidden email]> wrote:
Dammit you beat me to it. But yes, this is exactly what I was just writing.

On Tue, Oct 14, 2014 at 4:35 PM, Fabian Hueske <[hidden email]> wrote:
> Hi,
>
> I'm not super familiar with the iterations, but my guess would be that the
> filter is not evaluated as part of the iteration.
> Since it is not connect to the workset, the filter is not part of the loop
> and evaluated once outside where no superset number is available.
> I guess, moving the filter outside of the loop gives the same error.
>
> Cheers, Fabian
>
>
>
> 2014-10-14 16:18 GMT+02:00 Maximilian Alber <[hidden email]>:
>>
>> Hmm or maybe not. With this code I get some strange error:
>>
>> def createPlan_find_center(env: ExecutionEnvironment) = {
>> val X = env readTextFile config.xFile map
>> {Vector.parseFromString(config.dimensions, _)};
>> val residual = env readTextFile config.yFile map
>> {Vector.parseFromString(_)};
>> val randoms = env readTextFile config.randomFile map
>> {Vector.parseFromString(_)}
>>
>> val residual_2 = residual * residual
>> val ys = (residual_2 sumV) * (randoms filter {_.id == 0})
>>
>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> val sumVector = env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> val cumSum = emptyDataSet.iterateDelta(sumVector, config.N, Array("id")) {
>>     (solutionset, old_sum) =>
>>     val current = residual_2 filter (new RichFilterFunction[Vector]{
>>       def filter(x: Vector) = x.id ==
>> (getIterationRuntimeContext.getSuperstepNumber)
>>     })
>>     val sum = VectorDataSet.add(old_sum, current)
>>
>>     (sum map (new RichMapFunction[Vector, Vector]{
>>       def map(x: Vector) = new
>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>>     }),
>>     sum)
>> }
>>
>> Error:
>> 10/14/2014 15:57:35: Job execution switched to status RUNNING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> SCHEDULED
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> RUNNING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to FAILED
>> java.lang.IllegalStateException: This stub is not part of an iteration
>> step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> 10/14/2014 15:57:36: Job execution switched to status FAILING
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELING
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (org.apache.flink.api.scala.DataSet$$anon$1)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSink(TextOutputFormat (/tmp/tmplSYJ7S) - UTF-8)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Sync (WorksetIteration (Unnamed Delta Iteration))
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (bumpboost.BumpBoost$$anonfun$8$$anon$2) (1/1)
>> switched to CANCELED
>> 10/14/2014 15:57:36: SolutionSet Delta (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELED
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Job execution switched to status FAILED
>> Error: The program execution failed: java.lang.IllegalStateException: This
>> stub is not part of an iteration step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Tue, Oct 14, 2014 at 3:32 PM, Maximilian Alber
>> <[hidden email]> wrote:
>>>
>>> Should work now.
>>> Cheers
>>>
>>> On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber
>>> <[hidden email]> wrote:
>>>>
>>>> Ok, thanks.
>>>> Please let me know when it is fixed.
>>>>
>>>> Cheers
>>>> Max
>>>>
>>>> On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:
>>>>>
>>>>> Thank you, I will look into that...
>>>>
>>>>
>>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Forced to use Solution Set in Step Function

Maximilian Alber
The deltaiteration calculates the cumulative sum (prefix sum) of residual_2.

I'm not sure if got you idea. I could add residual_2 to the workset. But in that case I need to separate the "old_sum"(the sum up to now) and residual_2 each iteration. Actually I had that before, then I tried the Scala closure to clean up the code.

On Tue, Oct 14, 2014 at 4:57 PM, Fabian Hueske <[hidden email]> wrote:
Jep, I see you point.
Conceptually, all data that changes and affects the result of an iteration should be part of the workset.
Hence, the model kind of assumes that the datum "superStepNumber" should be part of the workset.

I am not familiar with your application, but would it make sense to add the number as an additional attribute to the workset data set and increase it manually?

2014-10-14 16:45 GMT+02:00 Maximilian Alber <[hidden email]>:
Ok, sounds true, but somehow I would like to execute it inside of it. So I probably need to do some nonsense work to make it part of it?

On Tue, Oct 14, 2014 at 4:36 PM, Aljoscha Krettek <[hidden email]> wrote:
Dammit you beat me to it. But yes, this is exactly what I was just writing.

On Tue, Oct 14, 2014 at 4:35 PM, Fabian Hueske <[hidden email]> wrote:
> Hi,
>
> I'm not super familiar with the iterations, but my guess would be that the
> filter is not evaluated as part of the iteration.
> Since it is not connect to the workset, the filter is not part of the loop
> and evaluated once outside where no superset number is available.
> I guess, moving the filter outside of the loop gives the same error.
>
> Cheers, Fabian
>
>
>
> 2014-10-14 16:18 GMT+02:00 Maximilian Alber <[hidden email]>:
>>
>> Hmm or maybe not. With this code I get some strange error:
>>
>> def createPlan_find_center(env: ExecutionEnvironment) = {
>> val X = env readTextFile config.xFile map
>> {Vector.parseFromString(config.dimensions, _)};
>> val residual = env readTextFile config.yFile map
>> {Vector.parseFromString(_)};
>> val randoms = env readTextFile config.randomFile map
>> {Vector.parseFromString(_)}
>>
>> val residual_2 = residual * residual
>> val ys = (residual_2 sumV) * (randoms filter {_.id == 0})
>>
>> val emptyDataSet = env.fromCollection[Vector](Seq())
>> val sumVector = env.fromCollection(Seq(Vector.zeros(config.dimensions)))
>> val cumSum = emptyDataSet.iterateDelta(sumVector, config.N, Array("id")) {
>>     (solutionset, old_sum) =>
>>     val current = residual_2 filter (new RichFilterFunction[Vector]{
>>       def filter(x: Vector) = x.id ==
>> (getIterationRuntimeContext.getSuperstepNumber)
>>     })
>>     val sum = VectorDataSet.add(old_sum, current)
>>
>>     (sum map (new RichMapFunction[Vector, Vector]{
>>       def map(x: Vector) = new
>> Vector(getIterationRuntimeContext.getSuperstepNumber, x.values)
>>     }),
>>     sum)
>> }
>>
>> Error:
>> 10/14/2014 15:57:35: Job execution switched to status RUNNING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> SCHEDULED
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> DEPLOYING
>> 10/14/2014 15:57:35: DataSource ([-1 0.0]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:35: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> RUNNING
>> 10/14/2014 15:57:35: DataSource ([]) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to RUNNING
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:35: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to SCHEDULED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to DEPLOYING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to RUNNING
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) -> Filter
>> (bumpboost.BumpBoost$$anonfun$8$$anon$1) (1/1) switched to FAILED
>> java.lang.IllegalStateException: This stub is not part of an iteration
>> step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> 10/14/2014 15:57:36: Job execution switched to status FAILING
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELING
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (org.apache.flink.api.scala.DataSet$$anon$1)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSink(TextOutputFormat (/tmp/tmplSYJ7S) - UTF-8)
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Sync (WorksetIteration (Unnamed Delta Iteration))
>> (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELING
>> 10/14/2014 15:57:36: Map (bumpboost.BumpBoost$$anonfun$8$$anon$2) (1/1)
>> switched to CANCELED
>> 10/14/2014 15:57:36: SolutionSet Delta (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: DataSource ([]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN DataSource (TextInputFormat (/tmp/tmpBhOsLd) -
>> UTF-8) -> Map (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to
>> CANCELED
>> 10/14/2014 15:57:36: DataSource ([-1 0.0]) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: CHAIN
>> Join(org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5) -> Map
>> (org.apache.flink.api.scala.DataSet$$anon$1) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: IterationHead(WorksetIteration (Unnamed Delta
>> Iteration)) (1/1) switched to CANCELED
>> 10/14/2014 15:57:36: Job execution switched to status FAILED
>> Error: The program execution failed: java.lang.IllegalStateException: This
>> stub is not part of an iteration step function.
>> at
>> org.apache.flink.api.common.functions.AbstractRichFunction.getIterationRuntimeContext(AbstractRichFunction.java:59)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:40)
>> at bumpboost.BumpBoost$$anonfun$8$$anon$1.filter(BumpBoost.scala:39)
>> at
>> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:47)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedFlatMapDriver.collect(ChainedFlatMapDriver.java:79)
>> at
>> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
>> at
>> org.apache.flink.api.scala.UnfinishedJoinOperation$$anon$5.join(joinDataSet.scala:184)
>> at
>> org.apache.flink.runtime.operators.hash.BuildFirstHashMatchIterator.callWithNextKey(BuildFirstHashMatchIterator.java:140)
>> at
>> org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484)
>> at
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359)
>> at
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Tue, Oct 14, 2014 at 3:32 PM, Maximilian Alber
>> <[hidden email]> wrote:
>>>
>>> Should work now.
>>> Cheers
>>>
>>> On Fri, Oct 10, 2014 at 3:38 PM, Maximilian Alber
>>> <[hidden email]> wrote:
>>>>
>>>> Ok, thanks.
>>>> Please let me know when it is fixed.
>>>>
>>>> Cheers
>>>> Max
>>>>
>>>> On Fri, Oct 10, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote:
>>>>>
>>>>> Thank you, I will look into that...
>>>>
>>>>
>>>
>>
>



12