Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Yiannis Gkoufas
Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Stephan Ewen

Hi Yiannis!

You need to increase the number of buffers for your setup. Here is a FAQ entry with a few pointers:

http://flink.apache.org/docs/0.8/faq.html#i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this

Greetings,
Stephan

Am 18.02.2015 21:21 schrieb "Yiannis Gkoufas" <[hidden email]>:
Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Fabian Hueske-2
In reply to this post by Yiannis Gkoufas
Hi Yiannis,

if you scale Flink to larger setups you need to adapt the number of network buffers.
The background section of the configuration reference explains the details on that [1].

Let us know, if that helped to solve the problem.

Best, Fabian


2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!

Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Yiannis Gkoufas
Hi!

thank you for your replies! 
I increased the number of network buffers:

taskmanager.network.numberOfBuffers: 2048

but I am still getting the same error:

Insufficient number of network buffers: required 120, but only 2 of 2048 available.

Thanks a lot!


On 18 February 2015 at 20:27, Fabian Hueske <[hidden email]> wrote:
Hi Yiannis,

if you scale Flink to larger setups you need to adapt the number of network buffers.
The background section of the configuration reference explains the details on that [1].

Let us know, if that helped to solve the problem.

Best, Fabian


2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!


Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Fabian Hueske-2
2048 is the default. So you didn't actually increase the number of buffers ;-)

Try 4096 or so.

2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
Hi!

thank you for your replies! 
I increased the number of network buffers:

taskmanager.network.numberOfBuffers: 2048

but I am still getting the same error:

Insufficient number of network buffers: required 120, but only 2 of 2048 available.

Thanks a lot!


On 18 February 2015 at 20:27, Fabian Hueske <[hidden email]> wrote:
Hi Yiannis,

if you scale Flink to larger setups you need to adapt the number of network buffers.
The background section of the configuration reference explains the details on that [1].

Let us know, if that helped to solve the problem.

Best, Fabian


2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!



Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Yiannis Gkoufas
Perfect! It worked! Thanks a lot for the help!

On 18 February 2015 at 22:13, Fabian Hueske <[hidden email]> wrote:
2048 is the default. So you didn't actually increase the number of buffers ;-)

Try 4096 or so.

2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
Hi!

thank you for your replies! 
I increased the number of network buffers:

taskmanager.network.numberOfBuffers: 2048

but I am still getting the same error:

Insufficient number of network buffers: required 120, but only 2 of 2048 available.

Thanks a lot!


On 18 February 2015 at 20:27, Fabian Hueske <[hidden email]> wrote:
Hi Yiannis,

if you scale Flink to larger setups you need to adapt the number of network buffers.
The background section of the configuration reference explains the details on that [1].

Let us know, if that helped to solve the problem.

Best, Fabian


2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
Hi there,

I have a cluster of 10 nodes with 12 CPUs each.
This is my configuration:

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 4024

taskmanager.heap.mb: 8096

taskmanager.numberOfTaskSlots: 12

parallelization.degree.default: 120

I have been getting the following error:

java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120) - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27 - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network buffers: required 120, but only 2 of 2048 available.
at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
at org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:244)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


I failed to get any info online on how to solve it.
Any help would be welcome.

Thank you!




Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Henry Saputra
Would it be helpful to add additional message in the error message in
NetworkBufferPool#createBufferPool to check the
taskmanager.network.numberOfBuffers property?


- Henry

On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <[hidden email]> wrote:

> Perfect! It worked! Thanks a lot for the help!
>
> On 18 February 2015 at 22:13, Fabian Hueske <[hidden email]> wrote:
>>
>> 2048 is the default. So you didn't actually increase the number of buffers
>> ;-)
>>
>> Try 4096 or so.
>>
>> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
>>>
>>> Hi!
>>>
>>> thank you for your replies!
>>> I increased the number of network buffers:
>>>
>>> taskmanager.network.numberOfBuffers: 2048
>>>
>>> but I am still getting the same error:
>>>
>>> Insufficient number of network buffers: required 120, but only 2 of 2048
>>> available.
>>>
>>> Thanks a lot!
>>>
>>>
>>> On 18 February 2015 at 20:27, Fabian Hueske <[hidden email]> wrote:
>>>>
>>>> Hi Yiannis,
>>>>
>>>> if you scale Flink to larger setups you need to adapt the number of
>>>> network buffers.
>>>> The background section of the configuration reference explains the
>>>> details on that [1].
>>>>
>>>> Let us know, if that helped to solve the problem.
>>>>
>>>> Best, Fabian
>>>>
>>>> [1] http://flink.apache.org/docs/0.8/config.html#background
>>>>
>>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
>>>>>
>>>>> Hi there,
>>>>>
>>>>> I have a cluster of 10 nodes with 12 CPUs each.
>>>>> This is my configuration:
>>>>>
>>>>> jobmanager.rpc.port: 6123
>>>>>
>>>>> jobmanager.heap.mb: 4024
>>>>>
>>>>> taskmanager.heap.mb: 8096
>>>>>
>>>>> taskmanager.numberOfTaskSlots: 12
>>>>>
>>>>> parallelization.degree.default: 120
>>>>>
>>>>> I have been getting the following error:
>>>>>
>>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120)
>>>>> - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27
>>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>>>>> buffers: required 120, but only 2 of 2048 available.
>>>>> at
>>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>>>>> at
>>>>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>> at
>>>>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>>> at
>>>>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>>
>>>>> I failed to get any info online on how to solve it.
>>>>> Any help would be welcome.
>>>>>
>>>>> Thank you!
>>>>
>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

rmetzger0
I agree with Henry.
We should include the name of the required configuration parameter into the exception.
Users often run into this issue.

I've filed a JIRA to track the fix: https://issues.apache.org/jira/browse/FLINK-1646


On Thu, Feb 19, 2015 at 6:18 PM, Henry Saputra <[hidden email]> wrote:
Would it be helpful to add additional message in the error message in
NetworkBufferPool#createBufferPool to check the
taskmanager.network.numberOfBuffers property?


- Henry

On Wed, Feb 18, 2015 at 4:32 PM, Yiannis Gkoufas <[hidden email]> wrote:
> Perfect! It worked! Thanks a lot for the help!
>
> On 18 February 2015 at 22:13, Fabian Hueske <[hidden email]> wrote:
>>
>> 2048 is the default. So you didn't actually increase the number of buffers
>> ;-)
>>
>> Try 4096 or so.
>>
>> 2015-02-18 22:59 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
>>>
>>> Hi!
>>>
>>> thank you for your replies!
>>> I increased the number of network buffers:
>>>
>>> taskmanager.network.numberOfBuffers: 2048
>>>
>>> but I am still getting the same error:
>>>
>>> Insufficient number of network buffers: required 120, but only 2 of 2048
>>> available.
>>>
>>> Thanks a lot!
>>>
>>>
>>> On 18 February 2015 at 20:27, Fabian Hueske <[hidden email]> wrote:
>>>>
>>>> Hi Yiannis,
>>>>
>>>> if you scale Flink to larger setups you need to adapt the number of
>>>> network buffers.
>>>> The background section of the configuration reference explains the
>>>> details on that [1].
>>>>
>>>> Let us know, if that helped to solve the problem.
>>>>
>>>> Best, Fabian
>>>>
>>>> [1] http://flink.apache.org/docs/0.8/config.html#background
>>>>
>>>> 2015-02-18 21:18 GMT+01:00 Yiannis Gkoufas <[hidden email]>:
>>>>>
>>>>> Hi there,
>>>>>
>>>>> I have a cluster of 10 nodes with 12 CPUs each.
>>>>> This is my configuration:
>>>>>
>>>>> jobmanager.rpc.port: 6123
>>>>>
>>>>> jobmanager.heap.mb: 4024
>>>>>
>>>>> taskmanager.heap.mb: 8096
>>>>>
>>>>> taskmanager.numberOfTaskSlots: 12
>>>>>
>>>>> parallelization.degree.default: 120
>>>>>
>>>>> I have been getting the following error:
>>>>>
>>>>> java.lang.Exception: Failed to deploy the task Reduce (SUM(1)) (65/120)
>>>>> - execution #0 to slot SimpleSlot (1)(0) - efc370a0b2a9a63f2e7b960cfe4e4c27
>>>>> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network
>>>>> buffers: required 120, but only 2 of 2048 available.
>>>>> at
>>>>> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:155)
>>>>> at
>>>>> org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:163)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:426)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> at
>>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>> at
>>>>> org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:344)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:247)
>>>>> at akka.dispatch.OnComplete.internal(Future.scala:244)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
>>>>> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>>> at
>>>>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>>
>>>>> I failed to get any info online on how to solve it.
>>>>> Any help would be welcome.
>>>>>
>>>>> Thank you!
>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Exception: Insufficient number of network buffers: required 120, but only 2 of 2048 available

Ufuk Celebi
Good idea. I've changed the message. :)

On 04 Mar 2015, at 14:51, Robert Metzger <[hidden email]> wrote:

> I agree with Henry.
> We should include the name of the required configuration parameter into the exception.
> Users often run into this issue.