Hi to all,
I have this strange error in my job and I don't know what's going on. What can I do? The full exception is: The slot in which the task was scheduled has been killed (probably loss of TaskManager). at org.apache.flink.runtime.instance.SimpleSlot.cancel(SimpleSlot.java:98) at org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroupAssignment.releaseSimpleSlot(SlotSharingGroupAssignment.java:335) at org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:319) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:106) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:151) at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:435) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) |
This means that the TaskManager was lost. The JobManager can no longer reach the TaskManager and consists all tasks executing ob the TaskManager as failed. Have a look at the TaskManager log, it should describe why the TaskManager failed. Am 15.04.2015 14:45 schrieb "Flavio Pompermaier" <[hidden email]>:
|
Yes , sorry for that..I found it somewhere in the logs..the problem was that the program didn't die immediately but was somehow hanging and I discovered the source of the problem only running the program on a subset of the data.
Thnks for the support, Flavio On Wed, Apr 15, 2015 at 2:56 PM, Stephan Ewen <[hidden email]> wrote:
|
I witnessed a similar issue yesterday on a simple job (single task chain, no shuffles) with a release-0.9 based fork. 2015-04-15 14:59 GMT+02:00 Flavio Pompermaier <[hidden email]>:
|
Something similar in flink-0.10-SNAPSHOT: 06/29/2015 10:33:46 CHAIN Join(Join at main(TriangleCount.java:79)) -> Combine (Reduce at main(TriangleCount.java:79))(222/224) switched to FAILED java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 57c67d938c9144bec5ba798bb8ebe636 @ wally025 - 8 slots - URL: akka.tcp://flink@130.149.249.35:56135/user/taskmanager at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:154) at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:421) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 06/29/2015 10:33:46 Job execution switched to status FAILING. On Mon, Jun 29, 2015 at 1:08 PM, Alexander Alexandrov <[hidden email]> wrote:
|
Do you have the JobManager and TaskManager logs of the corresponding TM, by any chance? On Mon, Jun 29, 2015 at 8:12 PM, Andra Lungu <[hidden email]> wrote:
|
Hey Till, I managed to reproduce the bug; The logs are in the corresponding JIRA [hopefully I got the right ones :)]: FLINK-2299On Tue, Jun 30, 2015 at 10:51 AM, Till Rohrmann <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |