Hi to all,
I’m aware there are a few threads on this, but I haven’t been able to solve an issue I am seeing and hoped someone can help. I’m trying to run the following: val connectedNetwork = new org.apache.flink.api.scala.DataSet[Vertex[Long, Long]]( Graph.fromTuple2DataSet(inputEdges, vertexInitialiser, env) .run(new ConnectedComponents[Long, NullValue](100))) And hitting the error: java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition: 8 maxPartition: 8 number of overflow segments: 122 bucketSize: 206 Overall memory: 19365888 Partition memory: 8388608 at org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:753) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromStart(CompactingHashTable.java:546) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:423) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:325) at org.apache.flink.runtime.iterative.task.IterationHeadTask.readInitialSolutionSet(IterationHeadTask.java:212) at org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:273) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Unknown Source) I’m running Flink 1.0.3 on windows 10 using start-local.bat. I have Xmx set to 6500MB, 8 workers, parallelism 8 and other memory settings left at default. The inputEdges dataset contains 141MB of Long,Long pairs (which is around 6 million edges). ParentID is unique and always negative, ChildID is non-unique and always positive (simulating a bipartite graph) An example few rows: -91498683401,1738 -135344401,5370 -100260517801,7970 -154352186001,12311 -160265532002,12826 The vast majority of the childIds are actually unique, and the most popular ID only occurs 10 times. VertexInitialiser just sets the vertex value to the id. Hopefully this is just a memory setting I’m not seeing for the hashTable as it dies almost instantly, I don’t think it gets very far into the dataset. I understand that the CompactingHashTable cannot spill, but I’d be surprised if it needed to at these low volumes. Many thanks for any help! Rob |
Hi Rob, On 13 May 2016 at 11:22, Arkay <[hidden email]> wrote: Hi to all, The start-local script will start a single JobManager and TaskManager. What do you mean by 8 workers? Have you set the numberOfTaskSlots to 8? To give all available memory to your TaskManager, you should set the "taskmanager.heap.mb" configuration option in flink-conf.yaml. Can you open the Flink dashboard at http://localhost:8081/ and check the configuration of your taskmanager? Cheers, -Vasia.
|
Thanks Vasia,
Apologies, yes by workers i mean I have set taskmanager.numberOfTaskSlots: 8 and parallelism.default: 8 in flink-conf.yaml. I have also set taskmanager.heap.mb: 6500 In the dashboard it is showing free memory as 5.64GB and Flink Managed Memory as 3.90GB. Thanks, Rob |
Thanks for checking Rob! I don't see any reason for the job to fail with this configuration and input size. I have no experience running Flink on windows though, so I might be missing something. Do you get a similar error with smaller inputs? -Vasia. On 13 May 2016 at 13:27, Arkay <[hidden email]> wrote: Thanks Vasia, |
This post was updated on .
Hi Vasia,
It seems to work OK up to about 50MB of input, and dies after that point. If i disable just this connected components step the rest of my program is happy with the full 1.5GB test dataset. It seems to be specifically limited to GraphAlgorithms in my case. Do you know what the units are when it is saying Partition memory: 8388608? If it is bytes then it sounds like its using around 256MB per hash table of 32 partitions (which is then multiplied by number of task slots i guess). Can this number be configured do you know? Perhaps the windows version of the JVM is defaulting it to a lower value than on Linux? Edit: I've noticed halving the parallelism to 4 more than doubles the partition memory to 18513920 but it still fails instantly. Thanks, Rob |
On 13 May 2016 at 14:28, Arkay <[hidden email]> wrote: Hi Vasia, So your program has other steps before/after the connected components algorithm? Could it be that you have some expensive operation that competes for memory with the hash table?
Yes, that's bytes. Can this number be configured do you know? Perhaps the windows version of By default, the hash table uses Fink's managed memory. That's 3.0GB in your case (0.7 of the total memory by default). You can change this fraction by setting the "taskmanager.memory.fraction" in the configuration. See [1] for other managed memory options. Hope this helps! -Vasia.
|
Thanks for the link, I had experimented with those options, apart from taskmanager.memory.off-heap: true. Turns out that allows it to run through happily! I don't know if that is a peculiarity of a windows JVM, as I understand that setting is purely an efficiency improvement?
For your first question, yes I have a number of steps that get scheduled around the same time in the job, its not really avoidable unless there are optimizer hints to tell the system to only run certain steps on their own? I will try cutting the rest of the program out as a test however. Thanks very much for your help with this, and all your excellent work on Flink and Gelly :) Rob |
Hey Rob, On 13 May 2016 at 15:45, Arkay <[hidden email]> wrote: Thanks for the link, I had experimented with those options, apart from Great to hear that you solved your problem! I'm not sure whether it's a windows peculiarity, maybe someone else could clear this up.
You could try setting the execution mode to BATCH/BATCH_FORCED with "env.getConfig.setExecutionMode()". It is typically more expensive than the default pipelined mode, but it guarantees that no successive operations are run concurrently. I will try cutting the rest of the program out as a test however. :)) Let us know if you run into any more problems or have questions. Cheers, -V. Rob |
Free forum by Nabble | Edit this page |