Hi Squirrels,
I have some trouble with a delta-iteration transitive closure program [1]. When I run the program, I get the following error: java.io.EOFException at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333) at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140) at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223) at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291) at org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307) at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62) at org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29) at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:745) Both input files have been generated and written to HDFS by Flink jobs. I already ran the Flink program that generated them several times: the error persists. You can find the logs at [2] and [3]. I am using the 0.9.0-milestone-1 release. Best, Stefan [1] Flink job: https://gist.github.com/knub/0bfec859a563009c1d57 [2] Job manager logs: https://gist.github.com/knub/01e3a4b0edb8cde66ff4 [3] One task manager's logs: https://gist.github.com/knub/8f2f953da95c8d7adefc |
Hi! After a quick look over the code, this seems like a bug. One cornercase of the overflow handling code does not check for the "running out of memory" condition. I would like to wait if Robert Waury has some ideas about that, he is the one most familiar with the code. I would guess, though, that you should be able to work around that by either setting the solution set as "unmanaged", or by slightly changing the memory configuration. It seems a rare cornercase, only if the memory runs out in a very special situation. You may be able to avoid running into it when using slightly more or less memory. Greetings, Stephan On Fri, Apr 17, 2015 at 3:59 PM, Stefan Bunk <[hidden email]> wrote:
|
Hi, I tested three configurations: taskmanager.heap.mb: 6144, taskmanager.memory.fraction: 0.5 taskmanager.heap.mb: 5544, taskmanager.memory.fraction: 0.6 taskmanager.heap.mb: 5144, taskmanager.memory.fraction: 0.7 The error occurs in all three configurations. In the last configuration, I can even find another exception in the logs of one of the taskmanagers: 19.Apr. 13:39:29 INFO Task - IterationHead(WorksetIteration (Resolved-Redirects)) (10/10) switched to FAILED : java.lang.IndexOutOfBoundsException: Index: 161, Size: 161 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.resetTo(InMemoryPartition.java:352) at org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.access$100(InMemoryPartition.java:301) at org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:226) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209) at org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:745) Greetings Stefan On 17 April 2015 at 16:05, Stephan Ewen <[hidden email]> wrote:
|
Hi,
I also get the EOFException on 0.9-SNAPSHOT when using a modified SingleSourceShortestPaths on big graphs. After I set the SolutionSet to "unmanaged" the job finishes. But this only works until a certain point, over which I cannot increase the input size further without getting an OutOfMemoryError exception. Best, Mihail On 19.04.2015 13:52, Stefan Bunk wrote:
|
Hi! The "unmanaged solution set" keeps the data outside the Flink managed memory, and is expected to throw OOM exceptions at some point. The managed variant clearly has a bug. Can you open a JIRA ticket for that and describe briefly where it happens and paste also the stacktrace? Stephan# On Sun, Apr 19, 2015 at 2:17 PM, Mihail Vieru <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |