Hello All,
I'm running the WordCount example streaming job and it fails because of loss of Taskmanagers. When gone through the logs of the taskmanager it has the following messages 15:14:26,592 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - State backend is set to heap memory (checkpoint to jobmanager) 15:14:26,703 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING 15:14:26,709 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING 15:14:26,712 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING 15:14:26,714 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with exception. org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) at java.lang.Thread.run(Thread.java:745) 15:27:29,401 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable 15:27:29,384 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123] 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with exception. org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) at java.lang.Thread.run(Thread.java:745) 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with exception. org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) at java.lang.Thread.run(Thread.java:745) 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with exception. org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate that the remote task manager was lost. at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) at java.lang.Thread.run(Thread.java:745) 15:27:29,518 INFO org.apache.flink.runtime.taskmanager.TaskManager - Cancelling all computations and discarding all cached data. 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (17/25) 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Source: Read Text File Source -> Flat Map (20/25) 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with exception. java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15:27:29,526 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25) 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25) 15:27:29,526 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25) 15:27:29,527 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25) 15:27:29,528 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52). 15:27:29,531 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Source: Read Text File Source -> Flat Map (6/25) 15:27:29,531 INFO org.apache.flink.runtime.taskmanager.Task - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with exception. java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15:27:29,532 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: Read Text File Source -> Flat Map (20/25) 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de). 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (15/25) 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Source: Read Text File Source -> Flat Map (13/25) 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with exception. java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15:27:29,536 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: Read Text File Source -> Flat Map (6/25) 15:27:29,538 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4). 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (16/25) 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Source: Read Text File Source -> Flat Map (24/25) 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with exception. java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no longer reachable at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task - Triggering cancellation of task code Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2). 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed (18/25) 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED 15:27:29,542 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: Read Text File Source -> Flat Map (24/25) 15:27:29,543 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: Read Text File Source -> Flat Map (13/25) 15:27:29,551 INFO org.apache.flink.runtime.taskmanager.TaskManager - Disassociating from JobManager 15:27:29,572 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted. 15:27:29,582 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:27:29,587 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:27:29,623 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 17 ms). 15:27:29,625 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 2 ms). 15:27:29,640 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds) 15:27:30,157 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds) 15:27:31,177 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds) 15:27:33,193 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout: 4000 milliseconds) 15:27:37,203 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout: 8000 milliseconds) 15:27:37,218 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:27:45,215 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout: 16000 milliseconds) 15:27:45,235 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:28:01,223 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30 seconds) 15:28:01,237 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:28:31,243 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30 seconds) 15:28:31,261 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:28:45,960 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:29:01,253 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30 seconds) 15:29:01,273 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:30:11,545 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout: 30 seconds) 15:30:11,562 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. 15:30:25,958 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. Could anyone explain what is the reason for the systems getting disassociated and be quarantined? Kind Regards, Ravinder Kaur |
Hey Ravinder,
can you please share the JobManager logs as well? The logs say that the TaskManager disconnects from the JobManager, because that one is not reachable anymore. At this point, the running shuffles are cancelled and you see the follow up RemoteTransportExceptions. – Ufuk On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> wrote: > Hello All, > > I'm running the WordCount example streaming job and it fails because of loss > of Taskmanagers. > > When gone through the logs of the taskmanager it has the following messages > > 15:14:26,592 INFO org.apache.flink.streaming.runtime.tasks.StreamTask > - State backend is set to heap memory (checkpoint to jobmanager) > 15:14:26,703 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING > 15:14:26,709 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING > 15:14:26,712 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING > 15:14:26,714 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with > exception. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connection unexpectedly closed by remote task manager > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate > that the remote task manager was lost. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > at java.lang.Thread.run(Thread.java:745) > 15:27:29,401 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager akka://flink/user/taskmanager disconnects from JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is no > longer reachable > 15:27:29,384 WARN akka.remote.RemoteWatcher > - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123] > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with > exception. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connection unexpectedly closed by remote task manager > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate > that the remote task manager was lost. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > at java.lang.Thread.run(Thread.java:745) > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with > exception. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connection unexpectedly closed by remote task manager > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate > that the remote task manager was lost. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > at java.lang.Thread.run(Thread.java:745) > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with > exception. > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: > Connection unexpectedly closed by remote task manager > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might indicate > that the remote task manager was lost. > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > at java.lang.Thread.run(Thread.java:745) > 15:27:29,518 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Cancelling all computations and discarding all cached data. > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed > (17/25) > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task > - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state FAILED > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Read Text File Source -> Flat > Map (20/25) > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Read Text File Source -> Flat Map (20/25) switched to FAILED with > exception. > java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: > JobManager is no longer reachable > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 15:27:29,526 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (15/25) > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (17/25) > 15:27:29,526 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (16/25) > 15:27:29,527 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed (18/25) > 15:27:29,528 INFO org.apache.flink.runtime.taskmanager.Task > - Triggering cancellation of task code Source: Read Text File Source -> Flat > Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52). > 15:27:29,531 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Read Text File Source -> Flat > Map (6/25) > 15:27:29,531 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED with > exception. > java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: > JobManager is no longer reachable > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 15:27:29,532 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Read Text File Source -> Flat Map > (20/25) > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task > - Triggering cancellation of task code Source: Read Text File Source -> Flat > Map (6/25) (2151d07383014506c5f1b0283b1986de). > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed > (15/25) > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task > - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state FAILED > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Read Text File Source -> Flat > Map (13/25) > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Read Text File Source -> Flat Map (13/25) switched to FAILED with > exception. > java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: > JobManager is no longer reachable > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 15:27:29,536 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Read Text File Source -> Flat Map > (6/25) > 15:27:29,538 INFO org.apache.flink.runtime.taskmanager.Task > - Triggering cancellation of task code Source: Read Text File Source -> Flat > Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4). > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed > (16/25) > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task > - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state FAILED > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Source: Read Text File Source -> Flat > Map (24/25) > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task > - Source: Read Text File Source -> Flat Map (24/25) switched to FAILED with > exception. > java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: > JobManager is no longer reachable > at > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task > - Triggering cancellation of task code Source: Read Text File Source -> Flat > Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2). > 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Keyed Aggregation -> Sink: Unnamed > (18/25) > 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task > - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state FAILED > 15:27:29,542 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Read Text File Source -> Flat Map > (24/25) > 15:27:29,543 INFO org.apache.flink.runtime.taskmanager.Task > - Freeing task resources for Source: Read Text File Source -> Flat Map > (13/25) > 15:27:29,551 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Disassociating from JobManager > 15:27:29,572 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has a UID that has been quarantined. Association aborted. > 15:27:29,582 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:27:29,587 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:27:29,623 INFO org.apache.flink.runtime.io.network.netty.NettyClient > - Successful shutdown (took 17 ms). > 15:27:29,625 INFO org.apache.flink.runtime.io.network.netty.NettyServer > - Successful shutdown (took 2 ms). > 15:27:29,640 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, timeout: > 500 milliseconds) > 15:27:30,157 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, timeout: > 1000 milliseconds) > 15:27:31,177 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, timeout: > 2000 milliseconds) > 15:27:33,193 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, timeout: > 4000 milliseconds) > 15:27:37,203 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, timeout: > 8000 milliseconds) > 15:27:37,218 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:27:45,215 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, timeout: > 16000 milliseconds) > 15:27:45,235 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:28:01,223 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, timeout: 30 > seconds) > 15:28:01,237 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:28:31,243 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, timeout: 30 > seconds) > 15:28:31,261 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:28:45,960 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:29:01,253 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, timeout: 30 > seconds) > 15:29:01,273 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:30:11,545 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, timeout: > 30 seconds) > 15:30:11,562 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > 15:30:25,958 WARN Remoting > - Tried to associate with unreachable remote address > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 ms, > all messages to this address will be delivered to dead letters. Reason: The > remote system has quarantined this system. No further associations to the > remote system are possible until this system is restarted. > > Could anyone explain what is the reason for the systems getting > disassociated and be quarantined? > > Kind Regards, > Ravinder Kaur > |
Hi Ufuk, Here is the log of the JobManager 15:14:26,030 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from SCHEDULED to DEPLOYING 15:14:26,030 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157 15:14:26,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CREATED to SCHEDULED 15:14:26,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from SCHEDULED to DEPLOYING 15:14:26,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157 15:14:26,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from SCHEDULED to DEPLOYING 15:14:26,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2 15:14:26,082 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from DEPLOYING to RUNNING 15:14:26,091 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from DEPLOYING to RUNNING 15:14:26,093 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from DEPLOYING to RUNNING 15:14:26,094 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from DEPLOYING to RUNNING 15:14:26,130 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CREATED to SCHEDULED 15:14:26,142 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CREATED to SCHEDULED 15:14:26,147 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from SCHEDULED to DEPLOYING 15:14:26,148 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157 15:14:26,168 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from SCHEDULED to DEPLOYING 15:14:26,168 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135 15:14:26,169 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CREATED to SCHEDULED 15:14:26,171 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from SCHEDULED to DEPLOYING 15:14:26,171 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135 15:14:26,177 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CREATED to SCHEDULED 15:14:26,192 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from SCHEDULED to DEPLOYING 15:14:26,192 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135 15:14:26,247 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CREATED to SCHEDULED 15:14:26,259 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from SCHEDULED to DEPLOYING 15:14:26,260 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135 15:14:26,520 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from DEPLOYING to RUNNING 15:14:26,530 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from DEPLOYING to RUNNING 15:14:26,530 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from DEPLOYING to RUNNING 15:14:26,739 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from DEPLOYING to RUNNING 15:14:26,740 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from DEPLOYING to RUNNING 15:14:26,747 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from DEPLOYING to RUNNING 15:14:26,748 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from DEPLOYING to RUNNING 15:14:26,748 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from DEPLOYING to RUNNING 15:14:26,749 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from DEPLOYING to RUNNING 15:14:26,755 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from DEPLOYING to RUNNING 15:14:26,775 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from DEPLOYING to RUNNING 15:14:26,776 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from DEPLOYING to RUNNING 15:14:26,776 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from DEPLOYING to RUNNING 15:14:26,825 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from DEPLOYING to RUNNING 15:14:27,099 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from DEPLOYING to RUNNING 15:14:27,102 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from DEPLOYING to RUNNING 15:14:27,103 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from DEPLOYING to RUNNING 15:22:36,119 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4) switched from RUNNING to FINISHED 15:22:43,697 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0) switched from RUNNING to FINISHED 15:22:46,998 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568) switched from RUNNING to FINISHED 15:23:00,969 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e) switched from RUNNING to FINISHED 15:23:25,615 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d) switched from RUNNING to FINISHED 15:23:26,734 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559) switched from RUNNING to FINISHED 15:24:04,736 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a) switched from RUNNING to FINISHED 15:24:16,436 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d) switched from RUNNING to FINISHED 15:24:17,705 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8) switched from RUNNING to FINISHED 15:27:05,900 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624] 15:27:05,913 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager terminated. 15:27:05,913 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52) switched from RUNNING to FAILED 15:27:05,930 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from RUNNING to CANCELING 15:27:05,936 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from RUNNING to CANCELING 15:27:05,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from RUNNING to CANCELING 15:27:05,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from RUNNING to CANCELING 15:27:05,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from RUNNING to CANCELING 15:27:05,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from RUNNING to CANCELING 15:27:05,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from RUNNING to CANCELING 15:27:05,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from RUNNING to CANCELING 15:27:05,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from RUNNING to CANCELING 15:27:05,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from RUNNING to CANCELING 15:27:05,940 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from RUNNING to CANCELING 15:27:05,941 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from RUNNING to CANCELING 15:27:05,941 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from RUNNING to CANCELING 15:27:05,941 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from RUNNING to CANCELING 15:27:05,941 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from RUNNING to CANCELING 15:27:05,941 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from RUNNING to CANCELING 15:27:05,941 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from RUNNING to CANCELING 15:27:05,942 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from RUNNING to CANCELING 15:27:05,942 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from RUNNING to CANCELING 15:27:05,942 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from RUNNING to CANCELING 15:27:05,942 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from RUNNING to CANCELING 15:27:05,942 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from RUNNING to CANCELING 15:27:05,943 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) switched from CANCELING to FAILED 15:27:05,943 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) switched from CANCELING to FAILED 15:27:05,944 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) switched from CANCELING to FAILED 15:27:05,944 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) switched from CANCELING to FAILED 15:27:05,944 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) switched from CANCELING to FAILED 15:27:05,945 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) switched from CANCELING to FAILED 15:27:05,945 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) switched from CANCELING to FAILED 15:27:05,945 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered task managers 6. Number of available slots 21. 15:27:05,947 INFO org.apache.flink.runtime.jobmanager.JobManager - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILING. java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156) at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15:27:06,020 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) switched from CANCELING to CANCELED 15:27:06,020 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) switched from CANCELING to CANCELED 15:27:06,021 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) switched from CANCELING to CANCELED 15:27:06,030 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) switched from CANCELING to CANCELED 15:27:06,031 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) switched from CANCELING to CANCELED 15:27:06,033 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) switched from CANCELING to CANCELED 15:27:06,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) switched from CANCELING to CANCELED 15:27:06,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) switched from CANCELING to CANCELED 15:27:06,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) switched from CANCELING to CANCELED 15:27:06,898 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager terminated. 15:27:06,899 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) switched from CANCELING to FAILED 15:27:06,900 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) switched from CANCELING to FAILED 15:27:06,897 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659] 15:27:06,902 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307] 15:27:06,902 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717] 15:27:06,902 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) switched from CANCELING to FAILED 15:27:06,904 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) switched from CANCELING to FAILED 15:27:06,905 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) switched from CANCELING to FAILED 15:27:06,906 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) switched from CANCELING to FAILED 15:27:06,916 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) switched from CANCELING to FAILED 15:27:06,917 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) switched from CANCELING to FAILED 15:27:06,917 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered task managers 5. Number of available slots 17. 15:27:06,918 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager terminated. 15:27:06,918 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) switched from CANCELING to FAILED 15:27:06,918 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) switched from CANCELING to FAILED 15:27:06,920 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) switched from CANCELING to FAILED 15:27:06,924 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) switched from CANCELING to FAILED 15:27:06,926 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) switched from CANCELING to FAILED 15:27:06,927 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) switched from CANCELING to FAILED 15:27:06,929 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) switched from CANCELING to FAILED 15:27:06,930 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) switched from CANCELING to FAILED 15:27:06,931 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered task managers 4. Number of available slots 13. 15:27:06,931 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager terminated. 15:27:06,932 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) switched from CANCELING to FAILED 15:27:06,933 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) switched from CANCELING to FAILED 15:27:06,934 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) switched from CANCELING to FAILED 15:27:06,935 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) switched from CANCELING to FAILED 15:27:06,936 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) switched from CANCELING to FAILED 15:27:06,937 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) switched from CANCELING to FAILED 15:27:06,938 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) switched from CANCELING to FAILED 15:27:06,939 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) switched from CANCELING to FAILED 15:27:06,940 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered task managers 3. Number of available slots 9. 15:27:06,940 INFO org.apache.flink.runtime.jobmanager.JobManager - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from SocketTextStream Example) changed to FAILED. java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ slave2 - 4 slots - URL: akka.tcp://flink@10.155.208.136:42624/user/taskmanager at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156) at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) at akka.actor.ActorCell.invoke(ActorCell.scala:486) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15:27:29,543 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined. 15:28:45,964 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still unreachable or has not been restarted. Keeping it quarantined. 15:28:45,978 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted. 15:28:45,991 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined. 15:28:45,999 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined. 15:28:46,108 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined. 15:28:53,349 INFO org.apache.flink.runtime.instance.InstanceManager - Registering TaskManager at akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as dead earlier because of a heart-beat timeout. 15:28:53,349 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at vm-10-155-208-135 (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4. Current number of alive task slots is 13. 15:30:25,988 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined. 15:30:25,992 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined. 15:30:26,090 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined. 15:32:05,994 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still unreachable or has not been restarted. Keeping it quarantined. 15:32:05,996 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still unreachable or has not been restarted. Keeping it quarantined. 15:32:05,999 INFO Remoting - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still unreachable or has not been restarted. Keeping it quarantined. 15:32:06,007 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted. 15:32:11,743 INFO org.apache.flink.runtime.instance.InstanceManager - Registering TaskManager at akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as dead earlier because of a heart-beat timeout. 15:32:11,743 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at slave2 (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5. Current number of alive task slots is 17. 15:58:47,993 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15:58:48,176 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15:58:48,744 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15:58:51,736 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15:58:54,050 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15:58:54,502 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389 15:58:54,509 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:55249 Kind Regards, Ravinder Kaur On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote: Hey Ravinder, |
Hello All, After trying to debug, the log of jobmanager suggests that the failed taskmanagers has stopped sending heartbeat messages. After this the TMs were detected unreachable by the JM. If the actor system is dead why is it not restarted by the supervisor during the Job? The TM is re-registered at the JM only after the job is switched to failed. The failure is still unclear. Can someone help diagnose the issue? Kind Regards, Ravinder Kaur On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <[hidden email]> wrote:
|
Ravinder, You could first try to fun standalone examples of flink without hadoop, and then try to run flink on hadoop. Your trouble seems to point to some hadoop cluster set up rather than anything to do with flink ? balaji On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <[hidden email]> wrote:
|
Hello Balaji, I'm running Flink on a standalone cluster and not on hadoop, though I had hadoop installed on these machines I do not use it. but that should not cause any problem. I have been successfully running Batch and Streaming jobs for a while now before this issue occured. According to the logs I think it is related to AKKA, but I'm not sure. Kind Regards, Ravinder Kaur On Wed, Mar 23, 2016 at 1:06 PM, Balaji Rajagopalan <[hidden email]> wrote:
|
Hi Ravinder,
It would be interesting to see the log output of the disconnected task manager. Possibly, the task manager ran out of memory because the state of your program got too big. You can work around this by using a different state backend like RocksDB. Cheers, Max On Wed, Mar 23, 2016 at 1:46 PM, Ravinder Kaur <[hidden email]> wrote: > Hello Balaji, > > I'm running Flink on a standalone cluster and not on hadoop, though I had > hadoop installed on these machines I do not use it. but that should not > cause any problem. > > I have been successfully running Batch and Streaming jobs for a while now > before this issue occured. According to the logs I think it is related to > AKKA, but I'm not sure. > > Kind Regards, > Ravinder Kaur > > On Wed, Mar 23, 2016 at 1:06 PM, Balaji Rajagopalan > <[hidden email]> wrote: >> >> Ravinder, >> You could first try to fun standalone examples of flink without hadoop, >> and then try to run flink on hadoop. Your trouble seems to point to some >> hadoop cluster set up rather than anything to do with flink ? >> >> balaji >> >> On Wed, Mar 23, 2016 at 5:27 PM, Ravinder Kaur <[hidden email]> >> wrote: >>> >>> Hello All, >>> >>> After trying to debug, the log of jobmanager suggests that the failed >>> taskmanagers has stopped sending heartbeat messages. After this the TMs were >>> detected unreachable by the JM. >>> >>> If the actor system is dead why is it not restarted by the supervisor >>> during the Job? The TM is re-registered at the JM only after the job is >>> switched to failed. The failure is still unclear. Can someone help diagnose >>> the issue? >>> >>> Kind Regards, >>> Ravinder Kaur >>> >>> >>> >>> >>> On Mon, Mar 21, 2016 at 10:41 PM, Ravinder Kaur <[hidden email]> >>> wrote: >>>> >>>> Hi Ufuk, >>>> >>>> Here is the log of the JobManager >>>> >>>> 15:14:26,030 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,030 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (19/25) (attempt #0) to vm-10-155-208-157 >>>> 15:14:26,035 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) >>>> switched from CREATED to SCHEDULED >>>> 15:14:26,036 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,036 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (20/25) (attempt #0) to vm-10-155-208-157 >>>> 15:14:26,039 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,039 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (18/25) (attempt #0) to slave2 >>>> 15:14:26,082 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,091 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,093 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,094 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,130 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) >>>> switched from CREATED to SCHEDULED >>>> 15:14:26,142 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) >>>> switched from CREATED to SCHEDULED >>>> 15:14:26,147 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,148 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (21/25) (attempt #0) to vm-10-155-208-157 >>>> 15:14:26,168 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,168 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (22/25) (attempt #0) to vm-10-155-208-135 >>>> 15:14:26,169 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) >>>> switched from CREATED to SCHEDULED >>>> 15:14:26,171 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,171 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (23/25) (attempt #0) to vm-10-155-208-135 >>>> 15:14:26,177 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) >>>> switched from CREATED to SCHEDULED >>>> 15:14:26,192 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,192 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (24/25) (attempt #0) to vm-10-155-208-135 >>>> 15:14:26,247 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) >>>> switched from CREATED to SCHEDULED >>>> 15:14:26,259 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) >>>> switched from SCHEDULED to DEPLOYING >>>> 15:14:26,260 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying >>>> Keyed Aggregation -> Sink: Unnamed (25/25) (attempt #0) to vm-10-155-208-135 >>>> 15:14:26,520 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,530 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,530 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,739 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,740 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,747 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,748 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,748 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,749 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,755 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,775 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,776 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,776 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:26,825 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:27,099 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:27,102 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) >>>> switched from DEPLOYING to RUNNING >>>> 15:14:27,103 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) >>>> switched from DEPLOYING to RUNNING >>>> 15:22:36,119 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (1/25) (7b81553dbfad2211d6860cff187b0fd4) >>>> switched from RUNNING to FINISHED >>>> 15:22:43,697 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (8/25) (dbcf94270f3b6c748cd7fb42d62087d0) >>>> switched from RUNNING to FINISHED >>>> 15:22:46,998 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (15/25) (5a9126d43ee0622f7fc24b541c4da568) >>>> switched from RUNNING to FINISHED >>>> 15:23:00,969 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (2/25) (d08cae46dcb5a9dc621161ec1c80a79e) >>>> switched from RUNNING to FINISHED >>>> 15:23:25,615 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (9/25) (9011ead6d1309e3d5b957a6f827ee58d) >>>> switched from RUNNING to FINISHED >>>> 15:23:26,734 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (16/25) (a490f46d6eaac7de26e2461f81246559) >>>> switched from RUNNING to FINISHED >>>> 15:24:04,736 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (10/25) (f7659a6f39daa8831458e3bd612ad16a) >>>> switched from RUNNING to FINISHED >>>> 15:24:16,436 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (17/25) (688b6dc1ed6c0bed284b751bc45bff6d) >>>> switched from RUNNING to FINISHED >>>> 15:24:17,705 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (3/25) (004de7884dc33d95e79337ad05774cf8) >>>> switched from RUNNING to FINISHED >>>> 15:27:05,900 WARN akka.remote.RemoteWatcher >>>> - Detected unreachable: [akka.tcp://flink@10.155.208.136:42624] >>>> 15:27:05,913 INFO org.apache.flink.runtime.jobmanager.JobManager >>>> - Task manager akka.tcp://flink@10.155.208.136:42624/user/taskmanager >>>> terminated. >>>> 15:27:05,913 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52) >>>> switched from RUNNING to FAILED >>>> 15:27:05,930 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,936 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,940 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,941 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,941 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,941 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,941 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,941 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,941 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,942 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,942 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,942 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,942 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,942 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) >>>> switched from RUNNING to CANCELING >>>> 15:27:05,943 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (16/25) (92b86c8f173898de99c02b2a4613a4a2) >>>> switched from CANCELING to FAILED >>>> 15:27:05,943 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2) >>>> switched from CANCELING to FAILED >>>> 15:27:05,944 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (15/25) (a65d4b2835d31143889e386a8704334d) >>>> switched from CANCELING to FAILED >>>> 15:27:05,944 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (17/25) (bc78671ffc663b8f43c8a46beb3a0d5c) >>>> switched from CANCELING to FAILED >>>> 15:27:05,944 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4) >>>> switched from CANCELING to FAILED >>>> 15:27:05,945 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (6/25) (2151d07383014506c5f1b0283b1986de) >>>> switched from CANCELING to FAILED >>>> 15:27:05,945 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (18/25) (382440047bd2ff051c51e546dad90ea9) >>>> switched from CANCELING to FAILED >>>> 15:27:05,945 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Unregistered task manager >>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager. Number of registered >>>> task managers 6. Number of available slots 21. >>>> 15:27:05,947 INFO org.apache.flink.runtime.jobmanager.JobManager >>>> - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from >>>> SocketTextStream Example) changed to FAILING. >>>> java.lang.Exception: The slot in which the task was executed has been >>>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ >>>> slave2 - 4 slots - URL: >>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager >>>> at >>>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153) >>>> at >>>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) >>>> at >>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) >>>> at >>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156) >>>> at >>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215) >>>> at >>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696) >>>> at >>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>> at >>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) >>>> at >>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>> at >>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) >>>> at >>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >>>> at >>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>> at >>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) >>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>> at >>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>> at >>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) >>>> at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) >>>> at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:486) >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>> 15:27:06,020 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (9/25) (46f7e3af97a70041ad878ca24992cc0a) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,020 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (10/25) (528fee08fc63c38ddd9ef0347cee570e) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,021 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (12/25) (d6eeb17c2105a091e42aeae7f3587fdd) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,030 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (19/25) (25cd0a9effb94d3c23fc4b35e45971e7) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,031 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (20/25) (eab969f46a8ce40ff6544a1c39663455) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,033 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (21/25) (63fd4446c027aa10248f828871201a0d) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,034 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (11/25) (1ad15af7f337f751eeae6e9ad38bb062) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,034 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (13/25) (1fe93db943e87964f4e379b6f4f8878d) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,035 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (14/25) (564818d5f16b250aaee285b81320c7ea) >>>> switched from CANCELING to CANCELED >>>> 15:27:06,898 INFO org.apache.flink.runtime.jobmanager.JobManager >>>> - Task manager akka.tcp://flink@10.155.208.137:56659/user/taskmanager >>>> terminated. >>>> 15:27:06,899 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (19/25) (332523796c07c7a8e14a6acc0d651538) >>>> switched from CANCELING to FAILED >>>> 15:27:06,900 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (2/25) (680cbf0c9c3de3d7610747298f2de3af) >>>> switched from CANCELING to FAILED >>>> 15:27:06,897 WARN akka.remote.RemoteWatcher >>>> - Detected unreachable: [akka.tcp://flink@10.155.208.137:56659] >>>> 15:27:06,902 WARN akka.remote.RemoteWatcher >>>> - Detected unreachable: [akka.tcp://flink@10.155.208.135:56307] >>>> 15:27:06,902 WARN akka.remote.RemoteWatcher >>>> - Detected unreachable: [akka.tcp://flink@10.155.208.138:55717] >>>> 15:27:06,902 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (12/25) (1b8401bd1721755149014feeb3366d91) >>>> switched from CANCELING to FAILED >>>> 15:27:06,904 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (3/25) (a0a635a93364f40fb666bb8a5cd66907) >>>> switched from CANCELING to FAILED >>>> 15:27:06,905 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (23/25) (7fef73c6ffbefd45f0d57007f7d85977) >>>> switched from CANCELING to FAILED >>>> 15:27:06,906 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (1/25) (5455ef7528959e20d37d497038536dd7) >>>> switched from CANCELING to FAILED >>>> 15:27:06,916 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (5/25) (4a1547a3bc7ef7eaffa682f3cc255b40) >>>> switched from CANCELING to FAILED >>>> 15:27:06,917 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (4/25) (6f50674481dc0ff1e9ff49cde5f3cd4b) >>>> switched from CANCELING to FAILED >>>> 15:27:06,917 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Unregistered task manager >>>> akka.tcp://flink@10.155.208.137:56659/user/taskmanager. Number of registered >>>> task managers 5. Number of available slots 17. >>>> 15:27:06,918 INFO org.apache.flink.runtime.jobmanager.JobManager >>>> - Task manager akka.tcp://flink@10.155.208.135:56307/user/taskmanager >>>> terminated. >>>> 15:27:06,918 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (4/25) (b1fc9f23faa6275ddd14d4537bceb7ef) >>>> switched from CANCELING to FAILED >>>> 15:27:06,918 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (25/25) (bf8593ce3db7957427ad8e15e29fb975) >>>> switched from CANCELING to FAILED >>>> 15:27:06,920 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (11/25) (e1179250546953f4199626d73c3062dd) >>>> switched from CANCELING to FAILED >>>> 15:27:06,924 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (24/25) (1d8b303a7cf313e110812336e98d577c) >>>> switched from CANCELING to FAILED >>>> 15:27:06,926 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (18/25) (6d4e73f0e2b9cc164a464610a4976505) >>>> switched from CANCELING to FAILED >>>> 15:27:06,927 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (23/25) (a35248ddf4ac4b296464ace643154664) >>>> switched from CANCELING to FAILED >>>> 15:27:06,929 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (22/25) (aa668d03252e7c9cd70e78399002fc31) >>>> switched from CANCELING to FAILED >>>> 15:27:06,930 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (22/25) (1573e487f0ed3ffd648ee460c7b78e64) >>>> switched from CANCELING to FAILED >>>> 15:27:06,931 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Unregistered task manager >>>> akka.tcp://flink@10.155.208.135:56307/user/taskmanager. Number of registered >>>> task managers 4. Number of available slots 13. >>>> 15:27:06,931 INFO org.apache.flink.runtime.jobmanager.JobManager >>>> - Task manager akka.tcp://flink@10.155.208.138:55717/user/taskmanager >>>> terminated. >>>> 15:27:06,932 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (14/25) (d71366c16e0d903b14b86e71ba9ba969) >>>> switched from CANCELING to FAILED >>>> 15:27:06,933 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (7/25) (b51d5cce7e2664bab04aa41d649cb632) >>>> switched from CANCELING to FAILED >>>> 15:27:06,934 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (6/25) (5522c46bb47fe0b0005833a25e9c9e72) >>>> switched from CANCELING to FAILED >>>> 15:27:06,935 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (21/25) (09bbdc8b4cb21d1b8a4ec5dc94fbf54f) >>>> switched from CANCELING to FAILED >>>> 15:27:06,936 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (7/25) (e5dede629edfb80e8342168c342c5a70) >>>> switched from CANCELING to FAILED >>>> 15:27:06,937 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (8/25) (634b686958f682aca02bee1c01a978c2) >>>> switched from CANCELING to FAILED >>>> 15:27:06,938 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Read >>>> Text File Source -> Flat Map (25/25) (d7c01a3fc5049263833d33457d57f6b4) >>>> switched from CANCELING to FAILED >>>> 15:27:06,939 INFO >>>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Keyed >>>> Aggregation -> Sink: Unnamed (5/25) (edbb3ee3eb7c0a1977b3dba20aee1d7f) >>>> switched from CANCELING to FAILED >>>> 15:27:06,940 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Unregistered task manager >>>> akka.tcp://flink@10.155.208.138:55717/user/taskmanager. Number of registered >>>> task managers 3. Number of available slots 9. >>>> 15:27:06,940 INFO org.apache.flink.runtime.jobmanager.JobManager >>>> - Status of job 6b4b496acabca01aa6690fa6b156b184 (WordCount from >>>> SocketTextStream Example) changed to FAILED. >>>> java.lang.Exception: The slot in which the task was executed has been >>>> released. Probably loss of TaskManager 07b034c041457fb1cb9eba017be8408e @ >>>> slave2 - 4 slots - URL: >>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager >>>> at >>>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:153) >>>> at >>>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) >>>> at >>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) >>>> at >>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156) >>>> at >>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215) >>>> at >>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696) >>>> at >>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>> at >>>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) >>>> at >>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>> at >>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) >>>> at >>>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >>>> at >>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>> at >>>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) >>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>> at >>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>> at >>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) >>>> at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) >>>> at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:486) >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>> at >>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>> 15:27:29,543 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:28:45,964 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.135:56307] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:28:45,978 WARN Remoting >>>> - Tried to associate with unreachable remote address >>>> [akka.tcp://flink@10.155.208.135:56307]. Address is now gated for 5000 ms, >>>> all messages to this address will be delivered to dead letters. Reason: The >>>> remote system has a UID that has been quarantined. Association aborted. >>>> 15:28:45,991 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:28:45,999 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:28:46,108 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:28:53,349 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Registering TaskManager at >>>> akka.tcp://flink@10.155.208.135:56307/user/taskmanager which was marked as >>>> dead earlier because of a heart-beat timeout. >>>> 15:28:53,349 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Registered TaskManager at vm-10-155-208-135 >>>> (akka.tcp://flink@10.155.208.135:56307/user/taskmanager) as >>>> 9b6267856b55ca2937bab02b48e464a8. Current number of registered hosts is 4. >>>> Current number of alive task slots is 13. >>>> 15:30:25,988 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:30:25,992 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:30:26,090 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:32:05,994 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.137:56659] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:32:05,996 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.138:55717] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:32:05,999 INFO Remoting >>>> - Quarantined address [akka.tcp://flink@10.155.208.136:42624] is still >>>> unreachable or has not been restarted. Keeping it quarantined. >>>> 15:32:06,007 WARN Remoting >>>> - Tried to associate with unreachable remote address >>>> [akka.tcp://flink@10.155.208.136:42624]. Address is now gated for 5000 ms, >>>> all messages to this address will be delivered to dead letters. Reason: The >>>> remote system has a UID that has been quarantined. Association aborted. >>>> 15:32:11,743 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Registering TaskManager at >>>> akka.tcp://flink@10.155.208.136:42624/user/taskmanager which was marked as >>>> dead earlier because of a heart-beat timeout. >>>> 15:32:11,743 INFO org.apache.flink.runtime.instance.InstanceManager >>>> - Registered TaskManager at slave2 >>>> (akka.tcp://flink@10.155.208.136:42624/user/taskmanager) as >>>> ff5d97a4837980371a14afa2b0f0823b. Current number of registered hosts is 5. >>>> Current number of alive task slots is 17. >>>> 15:58:47,993 WARN akka.remote.ReliableDeliverySupervisor >>>> - Association with remote system [akka.tcp://flink@10.155.208.156:33728] has >>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated]. >>>> 15:58:48,176 WARN akka.remote.ReliableDeliverySupervisor >>>> - Association with remote system [akka.tcp://flink@10.155.208.157:33728] has >>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated]. >>>> 15:58:48,744 WARN akka.remote.ReliableDeliverySupervisor >>>> - Association with remote system [akka.tcp://flink@10.155.208.158:33728] has >>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated]. >>>> 15:58:51,736 WARN akka.remote.ReliableDeliverySupervisor >>>> - Association with remote system [akka.tcp://flink@10.155.208.135:56307] has >>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated]. >>>> 15:58:54,050 WARN akka.remote.ReliableDeliverySupervisor >>>> - Association with remote system [akka.tcp://flink@10.155.208.136:42624] has >>>> failed, address is now gated for [5000] ms. Reason is: [Disassociated]. >>>> 15:58:54,502 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor >>>> - Removing web root dir /tmp/flink-web-0dd9cb75-526a-4a95-a920-194ed24b0389 >>>> 15:58:54,509 INFO org.apache.flink.runtime.blob.BlobServer >>>> - Stopped BLOB server at 0.0.0.0:55249 >>>> >>>> >>>> Kind Regards, >>>> Ravinder Kaur >>>> >>>> On Mon, Mar 21, 2016 at 10:23 PM, Ufuk Celebi <[hidden email]> wrote: >>>>> >>>>> Hey Ravinder, >>>>> >>>>> can you please share the JobManager logs as well? >>>>> >>>>> The logs say that the TaskManager disconnects from the JobManager, >>>>> because that one is not reachable anymore. At this point, the running >>>>> shuffles are cancelled and you see the follow up >>>>> RemoteTransportExceptions. >>>>> >>>>> – Ufuk >>>>> >>>>> >>>>> On Mon, Mar 21, 2016 at 5:42 PM, Ravinder Kaur <[hidden email]> >>>>> wrote: >>>>> > Hello All, >>>>> > >>>>> > I'm running the WordCount example streaming job and it fails because >>>>> > of loss >>>>> > of Taskmanagers. >>>>> > >>>>> > When gone through the logs of the taskmanager it has the following >>>>> > messages >>>>> > >>>>> > 15:14:26,592 INFO >>>>> > org.apache.flink.streaming.runtime.tasks.StreamTask >>>>> > - State backend is set to heap memory (checkpoint to jobmanager) >>>>> > 15:14:26,703 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to RUNNING >>>>> > 15:14:26,709 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to RUNNING >>>>> > 15:14:26,712 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to RUNNING >>>>> > 15:14:26,714 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to RUNNING >>>>> > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (15/25) switched to FAILED with >>>>> > exception. >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: >>>>> > Connection unexpectedly closed by remote task manager >>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might >>>>> > indicate >>>>> > that the remote task manager was lost. >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) >>>>> > at >>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) >>>>> > at java.lang.Thread.run(Thread.java:745) >>>>> > 15:27:29,401 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - TaskManager akka://flink/user/taskmanager disconnects from >>>>> > JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager: JobManager is >>>>> > no >>>>> > longer reachable >>>>> > 15:27:29,384 WARN akka.remote.RemoteWatcher >>>>> > - Detected unreachable: [akka.tcp://flink@10.155.208.156:6123] >>>>> > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (18/25) switched to FAILED with >>>>> > exception. >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: >>>>> > Connection unexpectedly closed by remote task manager >>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might >>>>> > indicate >>>>> > that the remote task manager was lost. >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) >>>>> > at >>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) >>>>> > at java.lang.Thread.run(Thread.java:745) >>>>> > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (16/25) switched to FAILED with >>>>> > exception. >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: >>>>> > Connection unexpectedly closed by remote task manager >>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might >>>>> > indicate >>>>> > that the remote task manager was lost. >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) >>>>> > at >>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) >>>>> > at java.lang.Thread.run(Thread.java:745) >>>>> > 15:27:29,356 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Keyed Aggregation -> Sink: Unnamed (17/25) switched to FAILED with >>>>> > exception. >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: >>>>> > Connection unexpectedly closed by remote task manager >>>>> > 'vm-10-155-208-135.cloud.mwn.de/10.155.208.135:37028'. This might >>>>> > indicate >>>>> > that the remote task manager was lost. >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) >>>>> > at >>>>> > >>>>> > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) >>>>> > at >>>>> > >>>>> > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) >>>>> > at >>>>> > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) >>>>> > at >>>>> > >>>>> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) >>>>> > at java.lang.Thread.run(Thread.java:745) >>>>> > 15:27:29,518 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Cancelling all computations and discarding all cached data. >>>>> > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: >>>>> > Unnamed >>>>> > (17/25) >>>>> > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Task Keyed Aggregation -> Sink: Unnamed (17/25) is already in state >>>>> > FAILED >>>>> > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Source: Read Text File Source -> >>>>> > Flat >>>>> > Map (20/25) >>>>> > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Source: Read Text File Source -> Flat Map (20/25) switched to >>>>> > FAILED with >>>>> > exception. >>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager >>>>> > disconnects >>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: >>>>> > JobManager is no longer reachable >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >>>>> > at >>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) >>>>> > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) >>>>> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>> > at >>>>> > >>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) >>>>> > at >>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) >>>>> > at >>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) >>>>> > at akka.actor.ActorCell.invoke(ActorCell.scala:486) >>>>> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>>> > at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>>> > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>>> > at >>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> > 15:27:29,526 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed >>>>> > (15/25) >>>>> > 15:27:29,525 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed >>>>> > (17/25) >>>>> > 15:27:29,526 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed >>>>> > (16/25) >>>>> > 15:27:29,527 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Keyed Aggregation -> Sink: Unnamed >>>>> > (18/25) >>>>> > 15:27:29,528 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Triggering cancellation of task code Source: Read Text File Source >>>>> > -> Flat >>>>> > Map (20/25) (9cd65cbf15aafe0241b9d59e48b79f52). >>>>> > 15:27:29,531 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Source: Read Text File Source -> >>>>> > Flat >>>>> > Map (6/25) >>>>> > 15:27:29,531 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Source: Read Text File Source -> Flat Map (6/25) switched to FAILED >>>>> > with >>>>> > exception. >>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager >>>>> > disconnects >>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: >>>>> > JobManager is no longer reachable >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >>>>> > at >>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) >>>>> > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) >>>>> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>> > at >>>>> > >>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) >>>>> > at >>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) >>>>> > at >>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) >>>>> > at akka.actor.ActorCell.invoke(ActorCell.scala:486) >>>>> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>>> > at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>>> > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>>> > at >>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> > 15:27:29,532 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Source: Read Text File Source -> Flat >>>>> > Map >>>>> > (20/25) >>>>> > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Triggering cancellation of task code Source: Read Text File Source >>>>> > -> Flat >>>>> > Map (6/25) (2151d07383014506c5f1b0283b1986de). >>>>> > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: >>>>> > Unnamed >>>>> > (15/25) >>>>> > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Task Keyed Aggregation -> Sink: Unnamed (15/25) is already in state >>>>> > FAILED >>>>> > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Source: Read Text File Source -> >>>>> > Flat >>>>> > Map (13/25) >>>>> > 15:27:29,535 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Source: Read Text File Source -> Flat Map (13/25) switched to >>>>> > FAILED with >>>>> > exception. >>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager >>>>> > disconnects >>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: >>>>> > JobManager is no longer reachable >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >>>>> > at >>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) >>>>> > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) >>>>> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>> > at >>>>> > >>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) >>>>> > at >>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) >>>>> > at >>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) >>>>> > at akka.actor.ActorCell.invoke(ActorCell.scala:486) >>>>> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>>> > at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>>> > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>>> > at >>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> > 15:27:29,536 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Source: Read Text File Source -> Flat >>>>> > Map >>>>> > (6/25) >>>>> > 15:27:29,538 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Triggering cancellation of task code Source: Read Text File Source >>>>> > -> Flat >>>>> > Map (13/25) (a6a0f05e0362dbf0505c1439893e53e4). >>>>> > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: >>>>> > Unnamed >>>>> > (16/25) >>>>> > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Task Keyed Aggregation -> Sink: Unnamed (16/25) is already in state >>>>> > FAILED >>>>> > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Source: Read Text File Source -> >>>>> > Flat >>>>> > Map (24/25) >>>>> > 15:27:29,539 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Source: Read Text File Source -> Flat Map (24/25) switched to >>>>> > FAILED with >>>>> > exception. >>>>> > java.lang.Exception: TaskManager akka://flink/user/taskmanager >>>>> > disconnects >>>>> > from JobManager akka.tcp://flink@10.155.208.156:6123/user/jobmanager: >>>>> > JobManager is no longer reachable >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) >>>>> > at >>>>> > >>>>> > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) >>>>> > at >>>>> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) >>>>> > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>> > at >>>>> > >>>>> > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119) >>>>> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>> > at >>>>> > >>>>> > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) >>>>> > at >>>>> > akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) >>>>> > at >>>>> > akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) >>>>> > at akka.actor.ActorCell.invoke(ActorCell.scala:486) >>>>> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) >>>>> > at akka.dispatch.Mailbox.run(Mailbox.scala:221) >>>>> > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) >>>>> > at >>>>> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> > at >>>>> > >>>>> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> > 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Triggering cancellation of task code Source: Read Text File Source >>>>> > -> Flat >>>>> > Map (24/25) (96df0f3c0990c5a573c67a31a6207fa2). >>>>> > 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Attempting to fail task externally Keyed Aggregation -> Sink: >>>>> > Unnamed >>>>> > (18/25) >>>>> > 15:27:29,541 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Task Keyed Aggregation -> Sink: Unnamed (18/25) is already in state >>>>> > FAILED >>>>> > 15:27:29,542 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Source: Read Text File Source -> Flat >>>>> > Map >>>>> > (24/25) >>>>> > 15:27:29,543 INFO org.apache.flink.runtime.taskmanager.Task >>>>> > - Freeing task resources for Source: Read Text File Source -> Flat >>>>> > Map >>>>> > (13/25) >>>>> > 15:27:29,551 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Disassociating from JobManager >>>>> > 15:27:29,572 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has a UID that has been quarantined. Association >>>>> > aborted. >>>>> > 15:27:29,582 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:27:29,587 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:27:29,623 INFO >>>>> > org.apache.flink.runtime.io.network.netty.NettyClient >>>>> > - Successful shutdown (took 17 ms). >>>>> > 15:27:29,625 INFO >>>>> > org.apache.flink.runtime.io.network.netty.NettyServer >>>>> > - Successful shutdown (took 2 ms). >>>>> > 15:27:29,640 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 1, >>>>> > timeout: >>>>> > 500 milliseconds) >>>>> > 15:27:30,157 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 2, >>>>> > timeout: >>>>> > 1000 milliseconds) >>>>> > 15:27:31,177 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 3, >>>>> > timeout: >>>>> > 2000 milliseconds) >>>>> > 15:27:33,193 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 4, >>>>> > timeout: >>>>> > 4000 milliseconds) >>>>> > 15:27:37,203 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 5, >>>>> > timeout: >>>>> > 8000 milliseconds) >>>>> > 15:27:37,218 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:27:45,215 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 6, >>>>> > timeout: >>>>> > 16000 milliseconds) >>>>> > 15:27:45,235 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:28:01,223 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 7, >>>>> > timeout: 30 >>>>> > seconds) >>>>> > 15:28:01,237 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:28:31,243 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 8, >>>>> > timeout: 30 >>>>> > seconds) >>>>> > 15:28:31,261 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:28:45,960 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:29:01,253 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 9, >>>>> > timeout: 30 >>>>> > seconds) >>>>> > 15:29:01,273 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:30:11,545 INFO org.apache.flink.runtime.taskmanager.TaskManager >>>>> > - Trying to register at JobManager >>>>> > akka.tcp://flink@10.155.208.156:6123/user/jobmanager (attempt 10, >>>>> > timeout: >>>>> > 30 seconds) >>>>> > 15:30:11,562 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > 15:30:25,958 WARN Remoting >>>>> > - Tried to associate with unreachable remote address >>>>> > [akka.tcp://flink@10.155.208.156:6123]. Address is now gated for 5000 >>>>> > ms, >>>>> > all messages to this address will be delivered to dead letters. >>>>> > Reason: The >>>>> > remote system has quarantined this system. No further associations to >>>>> > the >>>>> > remote system are possible until this system is restarted. >>>>> > >>>>> > Could anyone explain what is the reason for the systems getting >>>>> > disassociated and be quarantined? >>>>> > >>>>> > Kind Regards, >>>>> > Ravinder Kaur >>>>> > >>>> >>>> >>> >> > |
Free forum by Nabble | Edit this page |