EOFException when running Flink job

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

EOFException when running Flink job

Stefan Bunk
Hi Squirrels,

I have some trouble with a delta-iteration transitive closure program [1].
When I run the program, I get the following error:

java.io.EOFException
at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)

Both input files have been generated and written to HDFS by Flink jobs. I already ran the Flink program that generated them several times: the error persists.
You can find the logs at [2] and [3].

I am using the 0.9.0-milestone-1 release.

Best,
Stefan


Reply | Threaded
Open this post in threaded view
|

Re: EOFException when running Flink job

Stephan Ewen
Hi!

After a quick look over the code, this seems like a bug. One cornercase of the overflow handling code does not check for the "running out of memory" condition.

I would like to wait if Robert Waury has some ideas about that, he is the one most familiar with the code.

I would guess, though, that you should  be able to work around that by either setting the solution set as "unmanaged", or by slightly changing the memory configuration. It seems a rare cornercase, only if the memory runs out in a very special situation. You may be able to avoid running into it when using slightly more or less memory.

Greetings,
Stephan


On Fri, Apr 17, 2015 at 3:59 PM, Stefan Bunk <[hidden email]> wrote:
Hi Squirrels,

I have some trouble with a delta-iteration transitive closure program [1].
When I run the program, I get the following error:

java.io.EOFException
at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)

Both input files have been generated and written to HDFS by Flink jobs. I already ran the Flink program that generated them several times: the error persists.
You can find the logs at [2] and [3].

I am using the 0.9.0-milestone-1 release.

Best,
Stefan



Reply | Threaded
Open this post in threaded view
|

Re: EOFException when running Flink job

Stefan Bunk
Hi,

I tested three configurations:

taskmanager.heap.mb: 6144, taskmanager.memory.fraction: 0.5
taskmanager.heap.mb: 5544, taskmanager.memory.fraction: 0.6
taskmanager.heap.mb: 5144, taskmanager.memory.fraction: 0.7

The error occurs in all three configurations.
In the last configuration, I can even find another exception in the logs of one of the taskmanagers:

19.Apr. 13:39:29 INFO  Task                 - IterationHead(WorksetIteration (Resolved-Redirects)) (10/10) switched to FAILED : java.lang.IndexOutOfBoundsException: Index: 161, Size: 161
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.resetTo(InMemoryPartition.java:352)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.access$100(InMemoryPartition.java:301)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:226)
        at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
        at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
        at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
        at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
        at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
        at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
        at java.lang.Thread.run(Thread.java:745)

Greetings
Stefan

On 17 April 2015 at 16:05, Stephan Ewen <[hidden email]> wrote:
Hi!

After a quick look over the code, this seems like a bug. One cornercase of the overflow handling code does not check for the "running out of memory" condition.

I would like to wait if Robert Waury has some ideas about that, he is the one most familiar with the code.

I would guess, though, that you should  be able to work around that by either setting the solution set as "unmanaged", or by slightly changing the memory configuration. It seems a rare cornercase, only if the memory runs out in a very special situation. You may be able to avoid running into it when using slightly more or less memory.

Greetings,
Stephan


On Fri, Apr 17, 2015 at 3:59 PM, Stefan Bunk <[hidden email]> wrote:
Hi Squirrels,

I have some trouble with a delta-iteration transitive closure program [1].
When I run the program, I get the following error:

java.io.EOFException
at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)

Both input files have been generated and written to HDFS by Flink jobs. I already ran the Flink program that generated them several times: the error persists.
You can find the logs at [2] and [3].

I am using the 0.9.0-milestone-1 release.

Best,
Stefan




Reply | Threaded
Open this post in threaded view
|

Re: EOFException when running Flink job

Mihail Vieru
Hi,

I also get the EOFException on 0.9-SNAPSHOT when using a modified SingleSourceShortestPaths on big graphs.

After I set the SolutionSet to "unmanaged" the job finishes. But this only works until a certain point, over which I cannot increase the input size further without getting an OutOfMemoryError exception.

Best,
Mihail

On 19.04.2015 13:52, Stefan Bunk wrote:
Hi,

I tested three configurations:

taskmanager.heap.mb: 6144, taskmanager.memory.fraction: 0.5
taskmanager.heap.mb: 5544, taskmanager.memory.fraction: 0.6
taskmanager.heap.mb: 5144, taskmanager.memory.fraction: 0.7

The error occurs in all three configurations.
In the last configuration, I can even find another exception in the logs of one of the taskmanagers:

19.Apr. 13:39:29 INFO  Task                 - IterationHead(WorksetIteration (Resolved-Redirects)) (10/10) switched to FAILED : java.lang.IndexOutOfBoundsException: Index: 161, Size: 161
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.resetTo(InMemoryPartition.java:352)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.access$100(InMemoryPartition.java:301)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:226)
        at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
        at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
        at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
        at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
        at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
        at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
        at java.lang.Thread.run(Thread.java:745)

Greetings
Stefan

On 17 April 2015 at 16:05, Stephan Ewen <[hidden email]> wrote:
Hi!

After a quick look over the code, this seems like a bug. One cornercase of the overflow handling code does not check for the "running out of memory" condition.

I would like to wait if Robert Waury has some ideas about that, he is the one most familiar with the code.

I would guess, though, that you should  be able to work around that by either setting the solution set as "unmanaged", or by slightly changing the memory configuration. It seems a rare cornercase, only if the memory runs out in a very special situation. You may be able to avoid running into it when using slightly more or less memory.

Greetings,
Stephan


On Fri, Apr 17, 2015 at 3:59 PM, Stefan Bunk <[hidden email]> wrote:
Hi Squirrels,

I have some trouble with a delta-iteration transitive closure program [1].
When I run the program, I get the following error:

java.io.EOFException
at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)

Both input files have been generated and written to HDFS by Flink jobs. I already ran the Flink program that generated them several times: the error persists.
You can find the logs at [2] and [3].

I am using the 0.9.0-milestone-1 release.

Best,
Stefan





Reply | Threaded
Open this post in threaded view
|

Re: EOFException when running Flink job

Stephan Ewen
Hi!

The "unmanaged solution set" keeps the data outside the Flink managed memory, and is expected to throw OOM exceptions at some point. The managed variant clearly has a bug.

Can you open a JIRA ticket for that and describe briefly where it happens and paste also the stacktrace?

Stephan#

On Sun, Apr 19, 2015 at 2:17 PM, Mihail Vieru <[hidden email]> wrote:
Hi,

I also get the EOFException on 0.9-SNAPSHOT when using a modified SingleSourceShortestPaths on big graphs.

After I set the SolutionSet to "unmanaged" the job finishes. But this only works until a certain point, over which I cannot increase the input size further without getting an OutOfMemoryError exception.

Best,
Mihail


On 19.04.2015 13:52, Stefan Bunk wrote:
Hi,

I tested three configurations:

taskmanager.heap.mb: 6144, taskmanager.memory.fraction: 0.5
taskmanager.heap.mb: 5544, taskmanager.memory.fraction: 0.6
taskmanager.heap.mb: 5144, taskmanager.memory.fraction: 0.7

The error occurs in all three configurations.
In the last configuration, I can even find another exception in the logs of one of the taskmanagers:

19.Apr. 13:39:29 INFO  Task                 - IterationHead(WorksetIteration (Resolved-Redirects)) (10/10) switched to FAILED : java.lang.IndexOutOfBoundsException: Index: 161, Size: 161
        at java.util.ArrayList.rangeCheck(ArrayList.java:635)
        at java.util.ArrayList.get(ArrayList.java:411)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.resetTo(InMemoryPartition.java:352)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.access$100(InMemoryPartition.java:301)
        at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:226)
        at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
        at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
        at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
        at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
        at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
        at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
        at java.lang.Thread.run(Thread.java:745)

Greetings
Stefan

On 17 April 2015 at 16:05, Stephan Ewen <[hidden email]> wrote:
Hi!

After a quick look over the code, this seems like a bug. One cornercase of the overflow handling code does not check for the "running out of memory" condition.

I would like to wait if Robert Waury has some ideas about that, he is the one most familiar with the code.

I would guess, though, that you should  be able to work around that by either setting the solution set as "unmanaged", or by slightly changing the memory configuration. It seems a rare cornercase, only if the memory runs out in a very special situation. You may be able to avoid running into it when using slightly more or less memory.

Greetings,
Stephan


On Fri, Apr 17, 2015 at 3:59 PM, Stefan Bunk <[hidden email]> wrote:
Hi Squirrels,

I have some trouble with a delta-iteration transitive closure program [1].
When I run the program, I get the following error:

java.io.EOFException
at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)

Both input files have been generated and written to HDFS by Flink jobs. I already ran the Flink program that generated them several times: the error persists.
You can find the logs at [2] and [3].

I am using the 0.9.0-milestone-1 release.

Best,
Stefan