Streaming job failure due to loss of Taskmanagers

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming job failure due to loss of Taskmanagers

Ravinder Kaur
Hello All,

I'm running the WordCount example streaming job and it fails because of loss of Taskmanagers.

When gone through the logs of the taskmanager it has the following messages

15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - State backend is set to heap memory (checkpoint to jobmanager)
15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with exception.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost.
        at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)
15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable
15:27:29,384 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with exception.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost.
        at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)
15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with exception.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost.
        at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)
15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task                     - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with exception.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost.
        at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at java.lang.Thread.run(Thread.java:745)
15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Cancelling all computations and discarding all cached data.
15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (17/25)
15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task                     - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: Read Text File Source -> Flat Map (20/25)
15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with exception.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable
        at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
        at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: Read Text File Source -> Flat Map (6/25)
15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with exception.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable
        at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
        at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: Read Text File Source -> Flat Map (20/25)
15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de).
15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (15/25)
15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task                     - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: Read Text File Source -> Flat Map (13/25)
15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with exception.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable
        at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
        at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: Read Text File Source -> Flat Map (6/25)
15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (16/25)
15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task                     - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: Read Text File Source -> Flat Map (24/25)
15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with exception.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable
        at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
        at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (18/25)
15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task                     - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: Read Text File Source -> Flat Map (24/25)
15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: Read Text File Source -> Flat Map (13/25)
15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Disassociating from JobManager
15:27:29,572 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:27:29,582 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:27:29,587 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful shutdown (took 17 ms).
15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful shutdown (took 2 ms).
15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds)
15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds)
15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout: 4000 milliseconds)
15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout: 8000 milliseconds)
15:27:37,218 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout: 16000 milliseconds)
15:27:45,235 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30 seconds)
15:28:01,237 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30 seconds)
15:28:31,261 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:28:45,960 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30 seconds)
15:29:01,273 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout: 30 seconds)
15:30:11,562 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.
15:30:25,958 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.

Could anyone explain what is the reason for the systems getting disassociated and be quarantined?

Kind Regards,
Ravinder Kaur

Reply | Threaded
Open this post in threaded view
|

Re: Streaming job failure due to loss of Taskmanagers

Ufuk Celebi
Hey Ravinder,

can you please share the JobManager logs as well?

The logs say that the TaskManager disconnects from the JobManager,
because that one is not reachable anymore. At this point, the running
shuffles are cancelled and you see the follow up
RemoteTransportExceptions.

– Ufuk


On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> wrote:

> Hello All,
>
> I'm running the WordCount example streaming job and it fails because of loss
> of Taskmanagers.
>
> When gone through the logs of the taskmanager it has the following messages
>
> 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
> - State backend is set to heap memory (checkpoint to jobmanager)
> 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
> 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
> 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
> 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - TaskManager akka://flink/user/taskmanager disconnects from JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
> longer reachable
> 15:27:29,384 WARN  akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>  at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Cancelling all computations and discarding all cached data.
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (17/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (20/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
> 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
> 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (6/25)
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (20/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (6/25) (2151d07383014506c5f1b0283b1986de).
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (15/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (13/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (6/25)
> 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (16/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (24/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (18/25)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
> 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (24/25)
> 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (13/25)
> 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Disassociating from JobManager
> 15:27:29,572 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has a UID that has been quarantined. Association aborted.
> 15:27:29,582 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,587 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 17 ms).
> 15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
> 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
> 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
> 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
> 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
> 15:27:37,218 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout:
> 16000 milliseconds)
> 15:27:45,235 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30
> seconds)
> 15:28:01,237 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30
> seconds)
> 15:28:31,261 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:45,960 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30
> seconds)
> 15:29:01,273 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout:
> 30 seconds)
> 15:30:11,562 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:25,958 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
>
> Could anyone explain what is the reason for the systems getting
> disassociated and be quarantined?
>
> Kind Regards,
> Ravinder Kaur
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming job failure due to loss of Taskmanagers

Ravinder Kaur
Hi Ufuk,

Here is the log of the JobManager

15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from SCHEDULED to DEPLOYING
15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157
15:14:26,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CREATED to SCHEDULED
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from SCHEDULED to DEPLOYING
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from SCHEDULED to DEPLOYING
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2
15:14:26,082 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from DEPLOYING to RUNNING
15:14:26,091 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from DEPLOYING to RUNNING
15:14:26,093 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from DEPLOYING to RUNNING
15:14:26,094 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from DEPLOYING to RUNNING
15:14:26,130 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CREATED to SCHEDULED
15:14:26,142 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CREATED to SCHEDULED
15:14:26,147 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from SCHEDULED to DEPLOYING
15:14:26,148 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from SCHEDULED to DEPLOYING
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135
15:14:26,169 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CREATED to SCHEDULED
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from SCHEDULED to DEPLOYING
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135
15:14:26,177 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CREATED to SCHEDULED
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from SCHEDULED to DEPLOYING
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135
15:14:26,247 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CREATED to SCHEDULED
15:14:26,259 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from SCHEDULED to DEPLOYING
15:14:26,260 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135
15:14:26,520 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from DEPLOYING to RUNNING
15:14:26,739 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from DEPLOYING to RUNNING
15:14:26,740 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from DEPLOYING to RUNNING
15:14:26,747 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from DEPLOYING to RUNNING
15:14:26,749 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from DEPLOYING to RUNNING
15:14:26,755 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from DEPLOYING to RUNNING
15:14:26,775 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from DEPLOYING to RUNNING
15:14:26,825 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from DEPLOYING to RUNNING
15:14:27,099 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from DEPLOYING to RUNNING
15:14:27,102 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from DEPLOYING to RUNNING
15:14:27,103 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from DEPLOYING to RUNNING
15:22:36,119 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4) switched from RUNNING to FINISHED
15:22:43,697 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0) switched from RUNNING to FINISHED
15:22:46,998 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED
15:23:00,969 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e) switched from RUNNING to FINISHED
15:23:25,615 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d) switched from RUNNING to FINISHED
15:23:26,734 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED
15:24:04,736 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED
15:24:16,436 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED
15:24:17,705 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8) switched from RUNNING to FINISHED
15:27:05,900 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager terminated.
15:27:05,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED
15:27:05,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from RUNNING to CANCELING
15:27:05,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from RUNNING to CANCELING
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from CANCELING to FAILED
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered task managers 6. Number of available slots 21.
15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILING.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from CANCELING to CANCELED
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from CANCELING to CANCELED
15:27:06,021 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from CANCELING to CANCELED
15:27:06,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from CANCELING to CANCELED
15:27:06,031 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CANCELING to CANCELED
15:27:06,033 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from CANCELING to CANCELED
15:27:06,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from CANCELING to CANCELED
15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager terminated.
15:27:06,899 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED
15:27:06,900 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from CANCELING to FAILED
15:27:06,897 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
15:27:06,902 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED
15:27:06,904 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from CANCELING to FAILED
15:27:06,905 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED
15:27:06,906 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from CANCELING to FAILED
15:27:06,916 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered task managers 5. Number of available slots 17.
15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager terminated.
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from CANCELING to FAILED
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CANCELING to FAILED
15:27:06,920 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED
15:27:06,924 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CANCELING to FAILED
15:27:06,926 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED
15:27:06,927 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CANCELING to FAILED
15:27:06,929 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CANCELING to FAILED
15:27:06,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED
15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered task managers 4. Number of available slots 13.
15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager terminated.
15:27:06,932 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED
15:27:06,933 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from CANCELING to FAILED
15:27:06,934 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from CANCELING to FAILED
15:27:06,935 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED
15:27:06,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from CANCELING to FAILED
15:27:06,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from CANCELING to FAILED
15:27:06,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED
15:27:06,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from CANCELING to FAILED
15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered task managers 3. Number of available slots 9.
15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILED.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,543 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,964 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,978 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:28:45,991 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:46,108 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at vm-10-155-208-135 (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4. Current number of alive task slots is 13.
15:30:25,988 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:25,992 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:26,090 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,994 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,996 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:06,007 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at slave2 (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5. Current number of alive task slots is 17.
15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:55249


Kind Regards,
Ravinder Kaur

On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote:
Hey Ravinder,

can you please share the JobManager logs as well?

The logs say that the TaskManager disconnects from the JobManager,
because that one is not reachable anymore. At this point, the running
shuffles are cancelled and you see the follow up
RemoteTransportExceptions.

– Ufuk


On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> wrote:
> Hello All,
>
> I'm running the WordCount example streaming job and it fails because of loss
> of Taskmanagers.
>
> When gone through the logs of the taskmanager it has the following messages
>
> 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
> - State backend is set to heap memory (checkpoint to jobmanager)
> 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
> 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
> 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
> 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - TaskManager akka://flink/user/taskmanager disconnects from JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
> longer reachable
> 15:27:29,384 WARN  akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>  at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Cancelling all computations and discarding all cached data.
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (17/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (20/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
> 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
> 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (6/25)
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (20/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (6/25) (2151d07383014506c5f1b0283b1986de).
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (15/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (13/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (6/25)
> 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (16/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (24/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (18/25)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
> 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (24/25)
> 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (13/25)
> 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Disassociating from JobManager
> 15:27:29,572 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has a UID that has been quarantined. Association aborted.
> 15:27:29,582 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,587 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 17 ms).
> 15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
> 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
> 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
> 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
> 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
> 15:27:37,218 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout:
> 16000 milliseconds)
> 15:27:45,235 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30
> seconds)
> 15:28:01,237 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30
> seconds)
> 15:28:31,261 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:45,960 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30
> seconds)
> 15:29:01,273 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout:
> 30 seconds)
> 15:30:11,562 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:25,958 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
>
> Could anyone explain what is the reason for the systems getting
> disassociated and be quarantined?
>
> Kind Regards,
> Ravinder Kaur
>

Reply | Threaded
Open this post in threaded view
|

Re: Streaming job failure due to loss of Taskmanagers

Ravinder Kaur
Hello All,

After trying to debug, the log of jobmanager suggests that the failed taskmanagers has stopped sending heartbeat messages. After this the TMs were detected unreachable by the JM. 

If the actor system is dead why is it not restarted by the supervisor during the Job? The TM is re-registered at the JM only after the job is switched to failed. The failure is still unclear. Can someone help diagnose the issue?

Kind Regards,
Ravinder Kaur




On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <[hidden email]> wrote:
Hi Ufuk,

Here is the log of the JobManager

15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from SCHEDULED to DEPLOYING
15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157
15:14:26,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CREATED to SCHEDULED
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from SCHEDULED to DEPLOYING
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from SCHEDULED to DEPLOYING
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2
15:14:26,082 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from DEPLOYING to RUNNING
15:14:26,091 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from DEPLOYING to RUNNING
15:14:26,093 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from DEPLOYING to RUNNING
15:14:26,094 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from DEPLOYING to RUNNING
15:14:26,130 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CREATED to SCHEDULED
15:14:26,142 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CREATED to SCHEDULED
15:14:26,147 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from SCHEDULED to DEPLOYING
15:14:26,148 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from SCHEDULED to DEPLOYING
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135
15:14:26,169 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CREATED to SCHEDULED
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from SCHEDULED to DEPLOYING
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135
15:14:26,177 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CREATED to SCHEDULED
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from SCHEDULED to DEPLOYING
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135
15:14:26,247 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CREATED to SCHEDULED
15:14:26,259 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from SCHEDULED to DEPLOYING
15:14:26,260 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135
15:14:26,520 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from DEPLOYING to RUNNING
15:14:26,739 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from DEPLOYING to RUNNING
15:14:26,740 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from DEPLOYING to RUNNING
15:14:26,747 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from DEPLOYING to RUNNING
15:14:26,749 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from DEPLOYING to RUNNING
15:14:26,755 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from DEPLOYING to RUNNING
15:14:26,775 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from DEPLOYING to RUNNING
15:14:26,825 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from DEPLOYING to RUNNING
15:14:27,099 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from DEPLOYING to RUNNING
15:14:27,102 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from DEPLOYING to RUNNING
15:14:27,103 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from DEPLOYING to RUNNING
15:22:36,119 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4) switched from RUNNING to FINISHED
15:22:43,697 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0) switched from RUNNING to FINISHED
15:22:46,998 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED
15:23:00,969 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e) switched from RUNNING to FINISHED
15:23:25,615 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d) switched from RUNNING to FINISHED
15:23:26,734 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED
15:24:04,736 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED
15:24:16,436 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED
15:24:17,705 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8) switched from RUNNING to FINISHED
15:27:05,900 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager terminated.
15:27:05,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED
15:27:05,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from RUNNING to CANCELING
15:27:05,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from RUNNING to CANCELING
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from CANCELING to FAILED
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered task managers 6. Number of available slots 21.
15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILING.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from CANCELING to CANCELED
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from CANCELING to CANCELED
15:27:06,021 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from CANCELING to CANCELED
15:27:06,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from CANCELING to CANCELED
15:27:06,031 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CANCELING to CANCELED
15:27:06,033 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from CANCELING to CANCELED
15:27:06,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from CANCELING to CANCELED
15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager terminated.
15:27:06,899 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED
15:27:06,900 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from CANCELING to FAILED
15:27:06,897 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
15:27:06,902 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED
15:27:06,904 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from CANCELING to FAILED
15:27:06,905 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED
15:27:06,906 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from CANCELING to FAILED
15:27:06,916 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered task managers 5. Number of available slots 17.
15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager terminated.
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from CANCELING to FAILED
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CANCELING to FAILED
15:27:06,920 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED
15:27:06,924 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CANCELING to FAILED
15:27:06,926 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED
15:27:06,927 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CANCELING to FAILED
15:27:06,929 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CANCELING to FAILED
15:27:06,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED
15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered task managers 4. Number of available slots 13.
15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager terminated.
15:27:06,932 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED
15:27:06,933 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from CANCELING to FAILED
15:27:06,934 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from CANCELING to FAILED
15:27:06,935 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED
15:27:06,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from CANCELING to FAILED
15:27:06,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from CANCELING to FAILED
15:27:06,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED
15:27:06,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from CANCELING to FAILED
15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered task managers 3. Number of available slots 9.
15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILED.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,543 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,964 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,978 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:28:45,991 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:46,108 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at vm-10-155-208-135 (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4. Current number of alive task slots is 13.
15:30:25,988 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:25,992 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:26,090 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,994 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,996 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:06,007 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at slave2 (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5. Current number of alive task slots is 17.
15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:55249


Kind Regards,
Ravinder Kaur

On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote:
Hey Ravinder,

can you please share the JobManager logs as well?

The logs say that the TaskManager disconnects from the JobManager,
because that one is not reachable anymore. At this point, the running
shuffles are cancelled and you see the follow up
RemoteTransportExceptions.

– Ufuk


On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> wrote:
> Hello All,
>
> I'm running the WordCount example streaming job and it fails because of loss
> of Taskmanagers.
>
> When gone through the logs of the taskmanager it has the following messages
>
> 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
> - State backend is set to heap memory (checkpoint to jobmanager)
> 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
> 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
> 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
> 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - TaskManager akka://flink/user/taskmanager disconnects from JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
> longer reachable
> 15:27:29,384 WARN  akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>  at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Cancelling all computations and discarding all cached data.
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (17/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (20/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
> 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
> 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (6/25)
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (20/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (6/25) (2151d07383014506c5f1b0283b1986de).
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (15/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (13/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (6/25)
> 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (16/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (24/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (18/25)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
> 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (24/25)
> 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (13/25)
> 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Disassociating from JobManager
> 15:27:29,572 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has a UID that has been quarantined. Association aborted.
> 15:27:29,582 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,587 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 17 ms).
> 15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
> 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
> 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
> 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
> 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
> 15:27:37,218 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout:
> 16000 milliseconds)
> 15:27:45,235 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30
> seconds)
> 15:28:01,237 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30
> seconds)
> 15:28:31,261 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:45,960 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30
> seconds)
> 15:29:01,273 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout:
> 30 seconds)
> 15:30:11,562 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:25,958 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
>
> Could anyone explain what is the reason for the systems getting
> disassociated and be quarantined?
>
> Kind Regards,
> Ravinder Kaur
>


Reply | Threaded
Open this post in threaded view
|

Re: Streaming job failure due to loss of Taskmanagers

Balaji Rajagopalan
Ravinder,
  You could first try to fun standalone examples of flink without hadoop, and then try to run flink on hadoop. Your trouble seems to point to some hadoop cluster set up rather than anything to do with flink ? 

balaji 

On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <[hidden email]> wrote:
Hello All,

After trying to debug, the log of jobmanager suggests that the failed taskmanagers has stopped sending heartbeat messages. After this the TMs were detected unreachable by the JM. 

If the actor system is dead why is it not restarted by the supervisor during the Job? The TM is re-registered at the JM only after the job is switched to failed. The failure is still unclear. Can someone help diagnose the issue?

Kind Regards,
Ravinder Kaur




On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <[hidden email]> wrote:
Hi Ufuk,

Here is the log of the JobManager

15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from SCHEDULED to DEPLOYING
15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157
15:14:26,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CREATED to SCHEDULED
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from SCHEDULED to DEPLOYING
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from SCHEDULED to DEPLOYING
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2
15:14:26,082 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from DEPLOYING to RUNNING
15:14:26,091 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from DEPLOYING to RUNNING
15:14:26,093 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from DEPLOYING to RUNNING
15:14:26,094 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from DEPLOYING to RUNNING
15:14:26,130 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CREATED to SCHEDULED
15:14:26,142 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CREATED to SCHEDULED
15:14:26,147 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from SCHEDULED to DEPLOYING
15:14:26,148 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from SCHEDULED to DEPLOYING
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135
15:14:26,169 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CREATED to SCHEDULED
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from SCHEDULED to DEPLOYING
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135
15:14:26,177 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CREATED to SCHEDULED
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from SCHEDULED to DEPLOYING
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135
15:14:26,247 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CREATED to SCHEDULED
15:14:26,259 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from SCHEDULED to DEPLOYING
15:14:26,260 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135
15:14:26,520 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from DEPLOYING to RUNNING
15:14:26,739 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from DEPLOYING to RUNNING
15:14:26,740 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from DEPLOYING to RUNNING
15:14:26,747 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from DEPLOYING to RUNNING
15:14:26,749 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from DEPLOYING to RUNNING
15:14:26,755 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from DEPLOYING to RUNNING
15:14:26,775 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from DEPLOYING to RUNNING
15:14:26,825 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from DEPLOYING to RUNNING
15:14:27,099 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from DEPLOYING to RUNNING
15:14:27,102 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from DEPLOYING to RUNNING
15:14:27,103 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from DEPLOYING to RUNNING
15:22:36,119 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4) switched from RUNNING to FINISHED
15:22:43,697 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0) switched from RUNNING to FINISHED
15:22:46,998 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED
15:23:00,969 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e) switched from RUNNING to FINISHED
15:23:25,615 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d) switched from RUNNING to FINISHED
15:23:26,734 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED
15:24:04,736 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED
15:24:16,436 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED
15:24:17,705 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8) switched from RUNNING to FINISHED
15:27:05,900 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager terminated.
15:27:05,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED
15:27:05,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from RUNNING to CANCELING
15:27:05,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from RUNNING to CANCELING
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from CANCELING to FAILED
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered task managers 6. Number of available slots 21.
15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILING.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from CANCELING to CANCELED
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from CANCELING to CANCELED
15:27:06,021 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from CANCELING to CANCELED
15:27:06,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from CANCELING to CANCELED
15:27:06,031 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CANCELING to CANCELED
15:27:06,033 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from CANCELING to CANCELED
15:27:06,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from CANCELING to CANCELED
15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager terminated.
15:27:06,899 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED
15:27:06,900 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from CANCELING to FAILED
15:27:06,897 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
15:27:06,902 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED
15:27:06,904 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from CANCELING to FAILED
15:27:06,905 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED
15:27:06,906 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from CANCELING to FAILED
15:27:06,916 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered task managers 5. Number of available slots 17.
15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager terminated.
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from CANCELING to FAILED
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CANCELING to FAILED
15:27:06,920 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED
15:27:06,924 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CANCELING to FAILED
15:27:06,926 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED
15:27:06,927 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CANCELING to FAILED
15:27:06,929 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CANCELING to FAILED
15:27:06,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED
15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered task managers 4. Number of available slots 13.
15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager terminated.
15:27:06,932 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED
15:27:06,933 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from CANCELING to FAILED
15:27:06,934 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from CANCELING to FAILED
15:27:06,935 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED
15:27:06,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from CANCELING to FAILED
15:27:06,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from CANCELING to FAILED
15:27:06,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED
15:27:06,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from CANCELING to FAILED
15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered task managers 3. Number of available slots 9.
15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILED.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,543 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,964 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,978 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:28:45,991 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:46,108 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at vm-10-155-208-135 (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4. Current number of alive task slots is 13.
15:30:25,988 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:25,992 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:26,090 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,994 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,996 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:06,007 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at slave2 (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5. Current number of alive task slots is 17.
15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:55249


Kind Regards,
Ravinder Kaur

On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote:
Hey Ravinder,

can you please share the JobManager logs as well?

The logs say that the TaskManager disconnects from the JobManager,
because that one is not reachable anymore. At this point, the running
shuffles are cancelled and you see the follow up
RemoteTransportExceptions.

– Ufuk


On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> wrote:
> Hello All,
>
> I'm running the WordCount example streaming job and it fails because of loss
> of Taskmanagers.
>
> When gone through the logs of the taskmanager it has the following messages
>
> 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
> - State backend is set to heap memory (checkpoint to jobmanager)
> 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
> 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
> 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
> 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - TaskManager akka://flink/user/taskmanager disconnects from JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
> longer reachable
> 15:27:29,384 WARN  akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>  at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Cancelling all computations and discarding all cached data.
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (17/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (20/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
> 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
> 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (6/25)
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (20/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (6/25) (2151d07383014506c5f1b0283b1986de).
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (15/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (13/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (6/25)
> 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (16/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (24/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (18/25)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
> 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (24/25)
> 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (13/25)
> 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Disassociating from JobManager
> 15:27:29,572 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has a UID that has been quarantined. Association aborted.
> 15:27:29,582 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,587 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 17 ms).
> 15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
> 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
> 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
> 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
> 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
> 15:27:37,218 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout:
> 16000 milliseconds)
> 15:27:45,235 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30
> seconds)
> 15:28:01,237 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30
> seconds)
> 15:28:31,261 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:45,960 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30
> seconds)
> 15:29:01,273 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout:
> 30 seconds)
> 15:30:11,562 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:25,958 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
>
> Could anyone explain what is the reason for the systems getting
> disassociated and be quarantined?
>
> Kind Regards,
> Ravinder Kaur
>



Reply | Threaded
Open this post in threaded view
|

Re: Streaming job failure due to loss of Taskmanagers

Ravinder Kaur
Hello Balaji,

I'm running Flink on a standalone cluster and not on hadoop, though I had hadoop installed on these machines I do not use it. but that should not cause any problem. 

I have been successfully running Batch and Streaming jobs for a while now before this issue occured. According to the logs I think it is related to AKKA, but I'm not sure.

Kind Regards,
Ravinder Kaur

On Wed, Mar 23, 2016 at 1:06 PM, Balaji Rajagopalan <[hidden email]> wrote:
Ravinder,
  You could first try to fun standalone examples of flink without hadoop, and then try to run flink on hadoop. Your trouble seems to point to some hadoop cluster set up rather than anything to do with flink ? 

balaji 

On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <[hidden email]> wrote:
Hello All,

After trying to debug, the log of jobmanager suggests that the failed taskmanagers has stopped sending heartbeat messages. After this the TMs were detected unreachable by the JM. 

If the actor system is dead why is it not restarted by the supervisor during the Job? The TM is re-registered at the JM only after the job is switched to failed. The failure is still unclear. Can someone help diagnose the issue?

Kind Regards,
Ravinder Kaur




On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <[hidden email]> wrote:
Hi Ufuk,

Here is the log of the JobManager

15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from SCHEDULED to DEPLOYING
15:14:26,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157
15:14:26,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CREATED to SCHEDULED
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from SCHEDULED to DEPLOYING
15:14:26,036 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from SCHEDULED to DEPLOYING
15:14:26,039 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2
15:14:26,082 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from DEPLOYING to RUNNING
15:14:26,091 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from DEPLOYING to RUNNING
15:14:26,093 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from DEPLOYING to RUNNING
15:14:26,094 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from DEPLOYING to RUNNING
15:14:26,130 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CREATED to SCHEDULED
15:14:26,142 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CREATED to SCHEDULED
15:14:26,147 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from SCHEDULED to DEPLOYING
15:14:26,148 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from SCHEDULED to DEPLOYING
15:14:26,168 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135
15:14:26,169 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CREATED to SCHEDULED
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from SCHEDULED to DEPLOYING
15:14:26,171 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135
15:14:26,177 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CREATED to SCHEDULED
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from SCHEDULED to DEPLOYING
15:14:26,192 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135
15:14:26,247 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CREATED to SCHEDULED
15:14:26,259 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from SCHEDULED to DEPLOYING
15:14:26,260 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135
15:14:26,520 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from DEPLOYING to RUNNING
15:14:26,530 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from DEPLOYING to RUNNING
15:14:26,739 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from DEPLOYING to RUNNING
15:14:26,740 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from DEPLOYING to RUNNING
15:14:26,747 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from DEPLOYING to RUNNING
15:14:26,748 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from DEPLOYING to RUNNING
15:14:26,749 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from DEPLOYING to RUNNING
15:14:26,755 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from DEPLOYING to RUNNING
15:14:26,775 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from DEPLOYING to RUNNING
15:14:26,776 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from DEPLOYING to RUNNING
15:14:26,825 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from DEPLOYING to RUNNING
15:14:27,099 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from DEPLOYING to RUNNING
15:14:27,102 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from DEPLOYING to RUNNING
15:14:27,103 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from DEPLOYING to RUNNING
15:22:36,119 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4) switched from RUNNING to FINISHED
15:22:43,697 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0) switched from RUNNING to FINISHED
15:22:46,998 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED
15:23:00,969 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e) switched from RUNNING to FINISHED
15:23:25,615 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d) switched from RUNNING to FINISHED
15:23:26,734 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED
15:24:04,736 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED
15:24:16,436 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED
15:24:17,705 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8) switched from RUNNING to FINISHED
15:27:05,900 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager terminated.
15:27:05,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED
15:27:05,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from RUNNING to CANCELING
15:27:05,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING
15:27:05,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING
15:27:05,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from RUNNING to CANCELING
15:27:05,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from RUNNING to CANCELING
15:27:05,940 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from RUNNING to CANCELING
15:27:05,941 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from RUNNING to CANCELING
15:27:05,942 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from RUNNING to CANCELING
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from CANCELING to FAILED
15:27:05,943 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from CANCELING to FAILED
15:27:05,944 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from CANCELING to FAILED
15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered task managers 6. Number of available slots 21.
15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILING.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from CANCELING to CANCELED
15:27:06,020 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from CANCELING to CANCELED
15:27:06,021 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from CANCELING to CANCELED
15:27:06,030 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from CANCELING to CANCELED
15:27:06,031 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CANCELING to CANCELED
15:27:06,033 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from CANCELING to CANCELED
15:27:06,034 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from CANCELING to CANCELED
15:27:06,035 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from CANCELING to CANCELED
15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager terminated.
15:27:06,899 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED
15:27:06,900 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from CANCELING to FAILED
15:27:06,897 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
15:27:06,902 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
15:27:06,902 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED
15:27:06,904 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from CANCELING to FAILED
15:27:06,905 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED
15:27:06,906 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from CANCELING to FAILED
15:27:06,916 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from CANCELING to FAILED
15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered task managers 5. Number of available slots 17.
15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager terminated.
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from CANCELING to FAILED
15:27:06,918 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CANCELING to FAILED
15:27:06,920 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED
15:27:06,924 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CANCELING to FAILED
15:27:06,926 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED
15:27:06,927 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CANCELING to FAILED
15:27:06,929 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CANCELING to FAILED
15:27:06,930 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED
15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered task managers 4. Number of available slots 13.
15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager terminated.
15:27:06,932 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED
15:27:06,933 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from CANCELING to FAILED
15:27:06,934 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from CANCELING to FAILED
15:27:06,935 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED
15:27:06,936 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from CANCELING to FAILED
15:27:06,937 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from CANCELING to FAILED
15:27:06,938 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED
15:27:06,939 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from CANCELING to FAILED
15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered task managers 3. Number of available slots 9.
15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILED.
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager
        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
        at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
        at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
        at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
        at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
        at akka.actor.ActorCell.invoke(ActorCell.scala:486)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:27:29,543 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,964 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,978 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:28:45,991 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:45,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:46,108 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at vm-10-155-208-135 (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4. Current number of alive task slots is 13.
15:30:25,988 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:25,992 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:30:26,090 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,994 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,996 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:05,999 INFO  Remoting                                                      - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined.
15:32:06,007 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as dead earlier because of a heart-beat timeout.
15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at slave2 (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5. Current number of alive task slots is 17.
15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:55249


Kind Regards,
Ravinder Kaur

On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote:
Hey Ravinder,

can you please share the JobManager logs as well?

The logs say that the TaskManager disconnects from the JobManager,
because that one is not reachable anymore. At this point, the running
shuffles are cancelled and you see the follow up
RemoteTransportExceptions.

– Ufuk


On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> wrote:
> Hello All,
>
> I'm running the WordCount example streaming job and it fails because of loss
> of Taskmanagers.
>
> When gone through the logs of the taskmanager it has the following messages
>
> 15:14:26,592 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask
> - State backend is set to heap memory (checkpoint to jobmanager)
> 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
> 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
> 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
> 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - TaskManager akka://flink/user/taskmanager disconnects from JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no
> longer reachable
> 15:27:29,384 WARN  akka.remote.RemoteWatcher
> - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
> - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
> exception.
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate
> that the remote task manager was lost.
>         at
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>  at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>         at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>         at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>         at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at java.lang.Thread.run(Thread.java:745)
> 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Cancelling all computations and discarding all cached data.
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (17/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (20/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25)
> 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25)
> 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25)
> 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25)
> 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (6/25)
> 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (20/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (6/25) (2151d07383014506c5f1b0283b1986de).
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (15/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (13/25)
> 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (6/25)
> 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (16/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Source: Read Text File Source -> Flat
> Map (24/25)
> 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
> - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with
> exception.
> java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects
> from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
> JobManager is no longer reachable
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>         at
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Triggering cancellation of task code Source: Read Text File Source -> Flat
> Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed
> (18/25)
> 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
> - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED
> 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (24/25)
> 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: Read Text File Source -> Flat Map
> (13/25)
> 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Disassociating from JobManager
> 15:27:29,572 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has a UID that has been quarantined. Association aborted.
> 15:27:29,582 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,587 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:29,623 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 17 ms).
> 15:27:29,625 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout:
> 500 milliseconds)
> 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout:
> 1000 milliseconds)
> 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout:
> 2000 milliseconds)
> 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout:
> 4000 milliseconds)
> 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout:
> 8000 milliseconds)
> 15:27:37,218 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout:
> 16000 milliseconds)
> 15:27:45,235 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30
> seconds)
> 15:28:01,237 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30
> seconds)
> 15:28:31,261 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:28:45,960 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30
> seconds)
> 15:29:01,273 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> - Trying to register at JobManager
> akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout:
> 30 seconds)
> 15:30:11,562 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
> 15:30:25,958 WARN  Remoting
> - Tried to associate with unreachable remote address
> [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms,
> all messages to this address will be delivered to dead letters. Reason: The
> remote system has quarantined this system. No further associations to the
> remote system are possible until this system is restarted.
>
> Could anyone explain what is the reason for the systems getting
> disassociated and be quarantined?
>
> Kind Regards,
> Ravinder Kaur
>




Reply | Threaded
Open this post in threaded view
|

Re: Streaming job failure due to loss of Taskmanagers

Maximilian Michels
Hi Ravinder,

It would be interesting to see the log output of the disconnected task
manager. Possibly, the task manager ran out of memory because the
state of your program got too big. You can work around this by using a
different state backend like RocksDB.

Cheers,
Max

On Wed, Mar 23, 2016 at 1:46 PM, Ravinder Kaur <[hidden email]> wrote:

> Hello Balaji,
>
> I'm running Flink on a standalone cluster and not on hadoop, though I had
> hadoop installed on these machines I do not use it. but that should not
> cause any problem.
>
> I have been successfully running Batch and Streaming jobs for a while now
> before this issue occured. According to the logs I think it is related to
> AKKA, but I'm not sure.
>
> Kind Regards,
> Ravinder Kaur
>
> On Wed, Mar 23, 2016 at 1:06 PM, Balaji Rajagopalan
> <[hidden email]> wrote:
>>
>> Ravinder,
>>   You could first try to fun standalone examples of flink without hadoop,
>> and then try to run flink on hadoop. Your trouble seems to point to some
>> hadoop cluster set up rather than anything to do with flink ?
>>
>> balaji
>>
>> On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <[hidden email]>
>> wrote:
>>>
>>> Hello All,
>>>
>>> After trying to debug, the log of jobmanager suggests that the failed
>>> taskmanagers has stopped sending heartbeat messages. After this the TMs were
>>> detected unreachable by the JM.
>>>
>>> If the actor system is dead why is it not restarted by the supervisor
>>> during the Job? The TM is re-registered at the JM only after the job is
>>> switched to failed. The failure is still unclear. Can someone help diagnose
>>> the issue?
>>>
>>> Kind Regards,
>>> Ravinder Kaur
>>>
>>>
>>>
>>>
>>> On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <[hidden email]>
>>> wrote:
>>>>
>>>> Hi Ufuk,
>>>>
>>>> Here is the log of the JobManager
>>>>
>>>> 15:14:26,030 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,030 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157
>>>> 15:14:26,035 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>>> switched from CREATED to SCHEDULED
>>>> 15:14:26,036 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,036 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157
>>>> 15:14:26,039 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,039 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2
>>>> 15:14:26,082 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,091 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,093 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,094 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,130 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>>> switched from CREATED to SCHEDULED
>>>> 15:14:26,142 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>>> switched from CREATED to SCHEDULED
>>>> 15:14:26,147 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,148 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157
>>>> 15:14:26,168 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,168 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135
>>>> 15:14:26,169 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>>> switched from CREATED to SCHEDULED
>>>> 15:14:26,171 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,171 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135
>>>> 15:14:26,177 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>>> switched from CREATED to SCHEDULED
>>>> 15:14:26,192 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,192 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135
>>>> 15:14:26,247 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>>> switched from CREATED to SCHEDULED
>>>> 15:14:26,259 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>>> switched from SCHEDULED to DEPLOYING
>>>> 15:14:26,260 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying
>>>> Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135
>>>> 15:14:26,520 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,530 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,530 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,739 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,740 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,747 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,748 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,748 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,749 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,755 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,775 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,776 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,776 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:26,825 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:27,099 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:27,102 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:14:27,103 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>>> switched from DEPLOYING to RUNNING
>>>> 15:22:36,119 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4)
>>>> switched from RUNNING to FINISHED
>>>> 15:22:43,697 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0)
>>>> switched from RUNNING to FINISHED
>>>> 15:22:46,998 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568)
>>>> switched from RUNNING to FINISHED
>>>> 15:23:00,969 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e)
>>>> switched from RUNNING to FINISHED
>>>> 15:23:25,615 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d)
>>>> switched from RUNNING to FINISHED
>>>> 15:23:26,734 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559)
>>>> switched from RUNNING to FINISHED
>>>> 15:24:04,736 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a)
>>>> switched from RUNNING to FINISHED
>>>> 15:24:16,436 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d)
>>>> switched from RUNNING to FINISHED
>>>> 15:24:17,705 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8)
>>>> switched from RUNNING to FINISHED
>>>> 15:27:05,900 WARN  akka.remote.RemoteWatcher
>>>> - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624]
>>>> 15:27:05,913 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>> - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager
>>>> terminated.
>>>> 15:27:05,913 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52)
>>>> switched from RUNNING to FAILED
>>>> 15:27:05,930 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,936 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,940 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,941 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,941 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,941 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,941 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,941 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,941 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,942 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,942 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,942 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,942 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,942 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>>> switched from RUNNING to CANCELING
>>>> 15:27:05,943 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,943 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,944 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,944 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,944 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,945 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,945 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9)
>>>> switched from CANCELING to FAILED
>>>> 15:27:05,945 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Unregistered task manager
>>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered
>>>> task managers 6. Number of available slots 21.
>>>> 15:27:05,947 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>> - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from
>>>> SocketTextStream Example) changed to FAILING.
>>>> java.lang.Exception: The slot in which the task was executed has been
>>>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @
>>>> slave2 - 4 slots - URL:
>>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager
>>>>         at
>>>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
>>>>         at
>>>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>>>         at
>>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>>>         at
>>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>>>         at
>>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>>>         at
>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>>>         at
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>         at
>>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>>         at
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>         at
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>>         at
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>>         at
>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>         at
>>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>         at
>>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>         at
>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>  at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> 15:27:06,020 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,020 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,021 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,030 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,031 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,033 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,034 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,034 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,035 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea)
>>>> switched from CANCELING to CANCELED
>>>> 15:27:06,898 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>> - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager
>>>> terminated.
>>>> 15:27:06,899 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,900 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,897 WARN  akka.remote.RemoteWatcher
>>>> - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659]
>>>> 15:27:06,902 WARN  akka.remote.RemoteWatcher
>>>> - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307]
>>>> 15:27:06,902 WARN  akka.remote.RemoteWatcher
>>>> - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717]
>>>> 15:27:06,902 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,904 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,905 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,906 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,916 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,917 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,917 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Unregistered task manager
>>>> akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered
>>>> task managers 5. Number of available slots 17.
>>>> 15:27:06,918 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>> - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager
>>>> terminated.
>>>> 15:27:06,918 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,918 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,920 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,924 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,926 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,927 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,929 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,930 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,931 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Unregistered task manager
>>>> akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered
>>>> task managers 4. Number of available slots 13.
>>>> 15:27:06,931 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>> - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager
>>>> terminated.
>>>> 15:27:06,932 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,933 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,934 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,935 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,936 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,937 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,938 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Read
>>>> Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,939 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
>>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f)
>>>> switched from CANCELING to FAILED
>>>> 15:27:06,940 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Unregistered task manager
>>>> akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered
>>>> task managers 3. Number of available slots 9.
>>>> 15:27:06,940 INFO  org.apache.flink.runtime.jobmanager.JobManager
>>>> - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from
>>>> SocketTextStream Example) changed to FAILED.
>>>> java.lang.Exception: The slot in which the task was executed has been
>>>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @
>>>> slave2 - 4 slots - URL:
>>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager
>>>>         at
>>>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153)
>>>>         at
>>>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>>>         at
>>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>>>         at
>>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>>>         at
>>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>>>         at
>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>>>         at
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>         at
>>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>>         at
>>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>         at
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>>         at
>>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>>         at
>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>         at
>>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>         at
>>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>         at
>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>         at
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> 15:27:29,543 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:28:45,964 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:28:45,978 WARN  Remoting
>>>> - Tried to associate with unreachable remote address
>>>> [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms,
>>>> all messages to this address will be delivered to dead letters. Reason: The
>>>> remote system has a UID that has been quarantined. Association aborted.
>>>> 15:28:45,991 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:28:45,999 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:28:46,108 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Registering TaskManager at
>>>> akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as
>>>> dead earlier because of a heart-beat timeout.
>>>> 15:28:53,349 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Registered TaskManager at vm-10-155-208-135
>>>> (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as
>>>> 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4.
>>>> Current number of alive task slots is 13.
>>>> 15:30:25,988 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:30:25,992 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:30:26,090 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:32:05,994 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:32:05,996 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:32:05,999 INFO  Remoting
>>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still
>>>> unreachable or has not been restarted. Keeping it quarantined.
>>>> 15:32:06,007 WARN  Remoting
>>>> - Tried to associate with unreachable remote address
>>>> [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms,
>>>> all messages to this address will be delivered to dead letters. Reason: The
>>>> remote system has a UID that has been quarantined. Association aborted.
>>>> 15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Registering TaskManager at
>>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as
>>>> dead earlier because of a heart-beat timeout.
>>>> 15:32:11,743 INFO  org.apache.flink.runtime.instance.InstanceManager
>>>> - Registered TaskManager at slave2
>>>> (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as
>>>> ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5.
>>>> Current number of alive task slots is 17.
>>>> 15:58:47,993 WARN  akka.remote.ReliableDeliverySupervisor
>>>> - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has
>>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>>>> 15:58:48,176 WARN  akka.remote.ReliableDeliverySupervisor
>>>> - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has
>>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>>>> 15:58:48,744 WARN  akka.remote.ReliableDeliverySupervisor
>>>> - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has
>>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>>>> 15:58:51,736 WARN  akka.remote.ReliableDeliverySupervisor
>>>> - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has
>>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>>>> 15:58:54,050 WARN  akka.remote.ReliableDeliverySupervisor
>>>> - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has
>>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>>>> 15:58:54,502 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>>>> - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389
>>>> 15:58:54,509 INFO  org.apache.flink.runtime.blob.BlobServer
>>>> - Stopped BLOB server at 0.0.0.0:55249
>>>>
>>>>
>>>> Kind Regards,
>>>> Ravinder Kaur
>>>>
>>>> On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote:
>>>>>
>>>>> Hey Ravinder,
>>>>>
>>>>> can you please share the JobManager logs as well?
>>>>>
>>>>> The logs say that the TaskManager disconnects from the JobManager,
>>>>> because that one is not reachable anymore. At this point, the running
>>>>> shuffles are cancelled and you see the follow up
>>>>> RemoteTransportExceptions.
>>>>>
>>>>> – Ufuk
>>>>>
>>>>>
>>>>> On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]>
>>>>> wrote:
>>>>> > Hello All,
>>>>> >
>>>>> > I'm running the WordCount example streaming job and it fails because
>>>>> > of loss
>>>>> > of Taskmanagers.
>>>>> >
>>>>> > When gone through the logs of the taskmanager it has the following
>>>>> > messages
>>>>> >
>>>>> > 15:14:26,592 INFO
>>>>> > org.apache.flink.streaming.runtime.tasks.StreamTask
>>>>> > - State backend is set to heap memory (checkpoint to jobmanager)
>>>>> > 15:14:26,703 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING
>>>>> > 15:14:26,709 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING
>>>>> > 15:14:26,712 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING
>>>>> > 15:14:26,714 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING
>>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with
>>>>> > exception.
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>>> > Connection unexpectedly closed by remote task manager
>>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>>> > indicate
>>>>> > that the remote task manager was lost.
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>>> >         at
>>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>>> > 15:27:29,401 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - TaskManager akka://flink/user/taskmanager disconnects from
>>>>> > JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is
>>>>> > no
>>>>> > longer reachable
>>>>> > 15:27:29,384 WARN  akka.remote.RemoteWatcher
>>>>> > - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123]
>>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with
>>>>> > exception.
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>>> > Connection unexpectedly closed by remote task manager
>>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>>> > indicate
>>>>> > that the remote task manager was lost.
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>>> >         at
>>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with
>>>>> > exception.
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>>> > Connection unexpectedly closed by remote task manager
>>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>>> > indicate
>>>>> > that the remote task manager was lost.
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>>> >         at
>>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>>> > 15:27:29,356 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with
>>>>> > exception.
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>>>>> > Connection unexpectedly closed by remote task manager
>>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might
>>>>> > indicate
>>>>> > that the remote task manager was lost.
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
>>>>> >  at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
>>>>> >         at
>>>>> >
>>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>>>>> >         at
>>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>>>> >         at
>>>>> >
>>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>>>>> >         at java.lang.Thread.run(Thread.java:745)
>>>>> > 15:27:29,518 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Cancelling all computations and discarding all cached data.
>>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>>> > Unnamed
>>>>> > (17/25)
>>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state
>>>>> > FAILED
>>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>>> > Flat
>>>>> > Map (20/25)
>>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Source: Read Text File Source -> Flat Map (20/25) switched to
>>>>> > FAILED with
>>>>> > exception.
>>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>>> > disconnects
>>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>>> > JobManager is no longer reachable
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>>> >         at
>>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> >         at
>>>>> >
>>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>> >         at
>>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>> >         at
>>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> >         at
>>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>>> >  at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>> > 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>>> > (15/25)
>>>>> > 15:27:29,525 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>>> > (17/25)
>>>>> > 15:27:29,526 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>>> > (16/25)
>>>>> > 15:27:29,527 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed
>>>>> > (18/25)
>>>>> > 15:27:29,528 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>>> > -> Flat
>>>>> > Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52).
>>>>> > 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>>> > Flat
>>>>> > Map (6/25)
>>>>> > 15:27:29,531 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED
>>>>> > with
>>>>> > exception.
>>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>>> > disconnects
>>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>>> > JobManager is no longer reachable
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>>> >         at
>>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> >         at
>>>>> >
>>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>> >         at
>>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>> >         at
>>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> >         at
>>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>> > 15:27:29,532 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Source: Read Text File Source -> Flat
>>>>> > Map
>>>>> > (20/25)
>>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>>> > -> Flat
>>>>> > Map (6/25) (2151d07383014506c5f1b0283b1986de).
>>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>>> > Unnamed
>>>>> > (15/25)
>>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state
>>>>> > FAILED
>>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>>> > Flat
>>>>> > Map (13/25)
>>>>> > 15:27:29,535 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Source: Read Text File Source -> Flat Map (13/25) switched to
>>>>> > FAILED with
>>>>> > exception.
>>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>>> > disconnects
>>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>>> > JobManager is no longer reachable
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>>> >         at
>>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> >         at
>>>>> >
>>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>> >         at
>>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>> >         at
>>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> >         at
>>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>> > 15:27:29,536 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Source: Read Text File Source -> Flat
>>>>> > Map
>>>>> > (6/25)
>>>>> > 15:27:29,538 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>>> > -> Flat
>>>>> > Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4).
>>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>>> > Unnamed
>>>>> > (16/25)
>>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state
>>>>> > FAILED
>>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Source: Read Text File Source ->
>>>>> > Flat
>>>>> > Map (24/25)
>>>>> > 15:27:29,539 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Source: Read Text File Source -> Flat Map (24/25) switched to
>>>>> > FAILED with
>>>>> > exception.
>>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager
>>>>> > disconnects
>>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager:
>>>>> > JobManager is no longer reachable
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>>> >         at
>>>>> >
>>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>>> >         at
>>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>>> >         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> >         at
>>>>> >
>>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
>>>>> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> >         at
>>>>> >
>>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>> >         at
>>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>> >         at
>>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>> >         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> >         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> >         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> >         at
>>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> >         at
>>>>> >
>>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Triggering cancellation of task code Source: Read Text File Source
>>>>> > -> Flat
>>>>> > Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2).
>>>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink:
>>>>> > Unnamed
>>>>> > (18/25)
>>>>> > 15:27:29,541 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state
>>>>> > FAILED
>>>>> > 15:27:29,542 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Source: Read Text File Source -> Flat
>>>>> > Map
>>>>> > (24/25)
>>>>> > 15:27:29,543 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>> > - Freeing task resources for Source: Read Text File Source -> Flat
>>>>> > Map
>>>>> > (13/25)
>>>>> > 15:27:29,551 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Disassociating from JobManager
>>>>> > 15:27:29,572 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has a UID that has been quarantined. Association
>>>>> > aborted.
>>>>> > 15:27:29,582 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:27:29,587 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:27:29,623 INFO
>>>>> > org.apache.flink.runtime.io.network.netty.NettyClient
>>>>> > - Successful shutdown (took 17 ms).
>>>>> > 15:27:29,625 INFO
>>>>> > org.apache.flink.runtime.io.network.netty.NettyServer
>>>>> > - Successful shutdown (took 2 ms).
>>>>> > 15:27:29,640 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1,
>>>>> > timeout:
>>>>> > 500 milliseconds)
>>>>> > 15:27:30,157 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2,
>>>>> > timeout:
>>>>> > 1000 milliseconds)
>>>>> > 15:27:31,177 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3,
>>>>> > timeout:
>>>>> > 2000 milliseconds)
>>>>> > 15:27:33,193 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4,
>>>>> > timeout:
>>>>> > 4000 milliseconds)
>>>>> > 15:27:37,203 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5,
>>>>> > timeout:
>>>>> > 8000 milliseconds)
>>>>> > 15:27:37,218 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:27:45,215 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6,
>>>>> > timeout:
>>>>> > 16000 milliseconds)
>>>>> > 15:27:45,235 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:28:01,223 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7,
>>>>> > timeout: 30
>>>>> > seconds)
>>>>> > 15:28:01,237 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:28:31,243 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8,
>>>>> > timeout: 30
>>>>> > seconds)
>>>>> > 15:28:31,261 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:28:45,960 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:29:01,253 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9,
>>>>> > timeout: 30
>>>>> > seconds)
>>>>> > 15:29:01,273 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:30:11,545 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>> > - Trying to register at JobManager
>>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10,
>>>>> > timeout:
>>>>> > 30 seconds)
>>>>> > 15:30:11,562 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> > 15:30:25,958 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000
>>>>> > ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> > Reason: The
>>>>> > remote system has quarantined this system. No further associations to
>>>>> > the
>>>>> > remote system are possible until this system is restarted.
>>>>> >
>>>>> > Could anyone explain what is the reason for the systems getting
>>>>> > disassociated and be quarantined?
>>>>> >
>>>>> > Kind Regards,
>>>>> > Ravinder Kaur
>>>>> >
>>>>
>>>>
>>>
>>
>