I am running with one job manager and three task managers. Each task manager is receiving at most 8 gb of data, but the job is timing out. What parameters must I adjust? Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd) switched from SCHEDULED to FAILED on [unassigned resource]. java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_282] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_282] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_282] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_282] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_282] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_282] at org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:223) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:168) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91) ~[feature-LUM-3882-toledo--850a6747.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_282] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:442) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:209) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[feature-LUM-3882-toledo--850a6747.jar:?] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159) ~[feature-LUM-3882-toledo--850a6747.jar:?] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [feature-LUM-3882-toledo--850a6747.jar:?] at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [feature-LUM-3882-toledo--850a6747.jar:?] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [feature-LUM-3882-toledo--850a6747.jar:?] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [feature-LUM-3882-toledo--850a6747.jar:?] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [feature-LUM-3882-toledo--850a6747.jar:?] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.actor.Actor.aroundReceive(Actor.scala:517) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.actor.Actor.aroundReceive$(Actor.scala:515) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [feature-LUM-3882-toledo--850a6747.jar:?] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [feature-LUM-3882-toledo--850a6747.jar:?] Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) ~[feature-LUM-3882-toledo--850a6747.jar:?] ... 26 more Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 300000 ms at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0 |
Hi, Marco,
The root cause is NoResourceAvailableException. Could you provide the following information? - How many slots each TM has? - Your job's topology, it would also be good to share the job manager log. Best, Yangze Guo On Tue, May 25, 2021 at 12:10 PM Marco Villalobos <[hidden email]> wrote: > > I am running with one job manager and three task managers. > > Each task manager is receiving at most 8 gb of data, but the job is timing out. > > What parameters must I adjust? > > Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd) switched from SCHEDULED to FAILED on [unassigned resource]. > java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout > at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_282] > at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_282] > at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_282] > at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_282] > at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_282] > at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_282] > at org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:223) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:168) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_282] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282] > at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:442) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:209) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159) ~[feature-LUM-3882-toledo--850a6747.jar:?] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [feature-LUM-3882-toledo--850a6747.jar:?] > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.Actor.aroundReceive(Actor.scala:517) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.Actor.aroundReceive$(Actor.scala:515) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [feature-LUM-3882-toledo--850a6747.jar:?] > at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [feature-LUM-3882-toledo--850a6747.jar:?] > Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout > at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) ~[feature-LUM-3882-toledo--850a6747.jar:?] > ... 26 more > Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 300000 ms > at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0 |
In reply to this post by Marco Villalobos-2
Hi Marco, How are you starting the job? For example, are you using Yarn as the resource manager? It looks like there is just enough resources in the cluster to run this job. Assuming the cluster is correctly configured and Task Managers are able to connect with the Job Manager (can you share full JM/TM logs?), I would say your job is simply too large (32 parallelism?) for the given configuration. Best, Piotrek wt., 25 maj 2021 o 06:10 Marco Villalobos <[hidden email]> napisał(a):
|
Free forum by Nabble | Edit this page |