Task manager suddenly lost connection to JM

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Task manager suddenly lost connection to JM

Hao Sun
Hi team, I see an wired issue that one of my TM suddenly lost connection to JM.
Once the job running on the TM relocated to a new TM, it can reconnect to JM again.
And after a while, the new TM running the same job will repeat the same process.
It is not guaranteed the troubled TMs can reconnect to JM in a reasonable time frame, like minutes. Sometime it take days in order to reconnect successfully.

I am using Flink 1.3.2 and Kubernetes. Is this because of network congestion?

Thanks!

===== Logs from JM ======
2017-11-16 19:14:40,216 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]
2017-11-16 19:14:40,218 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager terminated.
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
	at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
	at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
	at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
	at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
	at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
	at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
	at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
	at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
	at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
	at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
	at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
	at akka.actor.ActorCell.invoke(ActorCell.scala:486)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
	at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
	at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
	at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
	at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
	at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
	at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
	at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
	at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
	at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
	at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
	at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
	at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
	at akka.actor.ActorCell.invoke(ActorCell.scala:486)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Try to restart or fail the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) if no longer possible.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state FAILING to RESTARTING.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f).
2017-11-16 19:14:40,229 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager fps-flink-taskmanager-701246518-rb4k6/10.225.131.3. Number of registered task managers 3. Number of available slots 3.
2017-11-16 19:14:40,301 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6: Name does not resolve]
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RESTARTING to CREATED.
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Restoring from latest valid checkpoint: Checkpoint 646 @ 1510859465870 for 3de8f35a1af689237e3c5c94023aba3f.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - No master state to restore
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state CREATED to RUNNING.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from CREATED to SCHEDULED.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from SCHEDULED to DEPLOYING.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (attempt #1) to fps-flink-taskmanager-701246518-v9vjx
2017-11-16 19:14:50,244 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6]
2017-11-16 19:14:50,987 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from DEPLOYING to RUNNING.
2017-11-16 19:15:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 649 @ 1510859724479
2017-11-16 19:15:25,638 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed checkpoint 649 (6724 bytes in 928 ms).
2017-11-16 19:16:22,205 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 
2017-11-16 19:16:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 648 @ 1510859810233
2017-11-16 19:17:12,972 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Received message for non-existing checkpoint 647
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at fps-flink-taskmanager-701246518-rb4k6/10.225.131.3 which was marked as dead earlier because of a heart-beat timeout.
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at fps-flink-taskmanager-701246518-rb4k6 (akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager) as 8905a72ca2655b9ed0bddb50ff3c49a9. Current number of registered hosts is 4. Current number of alive task slots is 4.
2017-11-16 19:17:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 650 @ 1510859844480

===== Logs from TM ======
2017-11-16 19:13:33,177 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, synchronous part) in thread Thread[Async calls on Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1),5,Flink Task Threads] took 1 ms.
2017-11-16 19:14:35,820 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-jobmanager:6123]
2017-11-16 19:14:37,525 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
2017-11-16 19:15:50,827 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Cancelling all computations and discarding all cached data.
2017-11-16 19:16:23,596 INFO  com.amazonaws.http.AmazonHttpClient                           - Unable to execute HTTP request: The target server failed to respond
org.apache.http.NoHttpResponseException: The target server failed to respond
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
	at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
	at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:1144)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:1133)
	at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:142)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream.close(HadoopDataOutputStream.java:48)
	at org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
	at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:319)
	at org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100)
	at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:270)
	at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:233)
	at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:906)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
2017-11-16 19:16:35,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:16:35,819 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-jobmanager:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has a UID that has been quarantined. Association aborted.] 
2017-11-16 19:16:45,817 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
	at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1095)
	at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:311)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
	at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
	at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
	at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:120)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
	at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
	at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
	at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
	at akka.actor.ActorCell.invoke(ActorCell.scala:486)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:16:55,815 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, asynchronous part) in thread Thread[pool-7-thread-647,5,Flink Task Threads] took 202637 ms.
2017-11-16 19:16:55,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Disassociating from JobManager
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.blob.BlobCache                       - Shutting down BlobCache
2017-11-16 19:17:13,677 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Successful registration at JobManager (akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager), starting network stack and library cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Determined BLOB server address to be fps-flink-jobmanager/10.231.37.240:6124. Starting BLOB cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-133033d7-a3c1-4ffb-b360-3d0efdc355aa
2017-11-16 19:17:31,097 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.util.regex.Pattern.matcher(Pattern.java:1093)
java.lang.String.replace(String.java:2239)
com.trueaccord.scalapb.textformat.TextFormatUtils$.escapeDoubleQuotesAndBackslashes(TextFormatUtils.scala:160)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)

2017-11-16 19:17:31,106 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 com.trueaccord.scalapb.textformat.TextGenerator.add(TextGenerator.scala:19)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:33)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)

2017-11-16 19:18:05,486 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.lang.String.replace(String.java:2240)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:14)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:13)
scala.collection.immutable.List.foreach(List.scala:381)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:13)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)

2017-11-16 19:18:34,375 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:18:36,257 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) [FAILED]
2017-11-16 19:18:36,618 ERROR org.apache.flink.runtime.taskmanager.TaskManager              - Cannot find task with ID 484ebabbd13dce5e8503d88005bcdb6c to unregister.
Reply | Threaded
Open this post in threaded view
|

RE: Task manager suddenly lost connection to JM

Chan, Regina

I have a similar problem where I lose Task Managers. I originally thought it had to do with memory issues but it doesn’t look that’s the case… Any ideas here? Am I missing something obvious?

 

11/19/2017 23:43:51     CHAIN DataSource (at createInput(ExecutionEnvironment.java:553) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> Filter (Filter at readParquetFile(FlinkRefinerOperations.java:899)) -> Map (Map at readParquetFile(FlinkRefinerOperations.java:902)) -> Map (Map at handleMilestoning(MergeTask.java:259))(1/1) switched to FAILED

java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='container_e5420_1511132977419_37521_01_000027'} @ d173636-336.dc.gs.com (dataPort=45003)

        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)

        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)

        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)

        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)

        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)

 

 

Thanks,

Regina

 

From: Hao Sun [mailto:[hidden email]]
Sent: Thursday, November 16, 2017 5:37 PM
To: user
Subject: Task manager suddenly lost connection to JM

 

Hi team, I see an wired issue that one of my TM suddenly lost connection to JM.

Once the job running on the TM relocated to a new TM, it can reconnect to JM again.

And after a while, the new TM running the same job will repeat the same process.

It is not guaranteed the troubled TMs can reconnect to JM in a reasonable time frame, like minutes. Sometime it take days in order to reconnect successfully.

 

I am using Flink 1.3.2 and Kubernetes. Is this because of network congestion?

 

Thanks!

 

===== Logs from JM ======
2017-11-16 19:14:40,216 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]
2017-11-16 19:14:40,218 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager terminated.
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
  at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
  at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
  at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
  at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
  at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
  at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
  at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
  at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
  at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
  at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
  at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
  at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Try to restart or fail the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) if no longer possible.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state FAILING to RESTARTING.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f).
2017-11-16 19:14:40,229 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager fps-flink-taskmanager-701246518-rb4k6/10.225.131.3. Number of registered task managers 3. Number of available slots 3.
2017-11-16 19:14:40,301 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6: Name does not resolve]
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RESTARTING to CREATED.
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Restoring from latest valid checkpoint: Checkpoint 646 @ 1510859465870 for 3de8f35a1af689237e3c5c94023aba3f.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - No master state to restore
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state CREATED to RUNNING.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from CREATED to SCHEDULED.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from SCHEDULED to DEPLOYING.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (attempt #1) to fps-flink-taskmanager-701246518-v9vjx
2017-11-16 19:14:50,244 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6]
2017-11-16 19:14:50,987 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from DEPLOYING to RUNNING.
2017-11-16 19:15:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 649 @ 1510859724479
2017-11-16 19:15:25,638 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed checkpoint 649 (6724 bytes in 928 ms).
2017-11-16 19:16:22,205 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 
2017-11-16 19:16:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 648 @ 1510859810233
2017-11-16 19:17:12,972 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Received message for non-existing checkpoint 647
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at fps-flink-taskmanager-701246518-rb4k6/10.225.131.3 which was marked as dead earlier because of a heart-beat timeout.
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at fps-flink-taskmanager-701246518-rb4k6 (akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager) as 8905a72ca2655b9ed0bddb50ff3c49a9. Current number of registered hosts is 4. Current number of alive task slots is 4.
2017-11-16 19:17:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 650 @ 1510859844480
 
===== Logs from TM ======
2017-11-16 19:13:33,177 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, synchronous part) in thread Thread[Async calls on Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1),5,Flink Task Threads] took 1 ms.
2017-11-16 19:14:35,820 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-jobmanager:6123]
2017-11-16 19:14:37,525 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
2017-11-16 19:15:50,827 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Cancelling all computations and discarding all cached data.
2017-11-16 19:16:23,596 INFO  com.amazonaws.http.AmazonHttpClient                           - Unable to execute HTTP request: The target server failed to respond
org.apache.http.NoHttpResponseException: The target server failed to respond
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
  at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
  at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
  at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
  at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
  at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
  at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
  at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
  at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
  at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
  at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
  at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
  at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
  at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
  at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:1144)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:1133)
  at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:142)
  at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
  at org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream.close(HadoopDataOutputStream.java:48)
  at org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
  at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:319)
  at org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100)
  at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:270)
  at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:233)
  at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
  at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:906)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:748)
2017-11-16 19:16:35,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:16:35,819 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-jobmanager:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has a UID that has been quarantined. Association aborted.] 
2017-11-16 19:16:45,817 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
  at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1095)
  at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:311)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:120)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:16:55,815 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, asynchronous part) in thread Thread[pool-7-thread-647,5,Flink Task Threads] took 202637 ms.
2017-11-16 19:16:55,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Disassociating from JobManager
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.blob.BlobCache                       - Shutting down BlobCache
2017-11-16 19:17:13,677 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Successful registration at JobManager (akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager), starting network stack and library cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Determined BLOB server address to be fps-flink-jobmanager/10.231.37.240:6124. Starting BLOB cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-133033d7-a3c1-4ffb-b360-3d0efdc355aa
2017-11-16 19:17:31,097 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.util.regex.Pattern.matcher(Pattern.java:1093)
java.lang.String.replace(String.java:2239)
com.trueaccord.scalapb.textformat.TextFormatUtils$.escapeDoubleQuotesAndBackslashes(TextFormatUtils.scala:160)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:17:31,106 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 com.trueaccord.scalapb.textformat.TextGenerator.add(TextGenerator.scala:19)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:33)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:18:05,486 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.lang.String.replace(String.java:2240)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:14)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:13)
scala.collection.immutable.List.foreach(List.scala:381)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:13)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:18:34,375 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:18:36,257 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) [FAILED]
2017-11-16 19:18:36,618 ERROR org.apache.flink.runtime.taskmanager.TaskManager              - Cannot find task with ID 484ebabbd13dce5e8503d88005bcdb6c to unregister.
Reply | Threaded
Open this post in threaded view
|

Re: Task manager suddenly lost connection to JM

Stephan Ewen
We recently observed something like that in some tests and the problem was the following:

A Netty dependency pulled in via Hadoop or ZooKeeper conflicted with Akka's Netty dependency, which lead to remote connection failures.

In Flink 1.4, we fix that by shading Akka's Netty to make sure this cannot happen.

If that is in fact the problem you see, in Flink 1.3.2, you need to exclude Hadoop's Netty / ZooKeeper's Netty from the classpath.

Best,
Stephan


On Mon, Nov 20, 2017 at 6:02 AM, Chan, Regina <[hidden email]> wrote:

I have a similar problem where I lose Task Managers. I originally thought it had to do with memory issues but it doesn’t look that’s the case… Any ideas here? Am I missing something obvious?

 

11/19/2017 23:43:51     CHAIN DataSource (at createInput(ExecutionEnvironment.java:553) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> Filter (Filter at readParquetFile(FlinkRefinerOperations.java:899)) -> Map (Map at readParquetFile(FlinkRefinerOperations.java:902)) -> Map (Map at handleMilestoning(MergeTask.java:259))(1/1) switched to FAILED

java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='container_e5420_1511132977419_37521_01_000027'} @ d173636-336.dc.gs.com (dataPort=45003)

        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)

        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)

        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)

        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)

        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)

 

 

Thanks,

Regina

 

From: Hao Sun [mailto:[hidden email]]
Sent: Thursday, November 16, 2017 5:37 PM
To: user
Subject: Task manager suddenly lost connection to JM

 

Hi team, I see an wired issue that one of my TM suddenly lost connection to JM.

Once the job running on the TM relocated to a new TM, it can reconnect to JM again.

And after a while, the new TM running the same job will repeat the same process.

It is not guaranteed the troubled TMs can reconnect to JM in a reasonable time frame, like minutes. Sometime it take days in order to reconnect successfully.

 

I am using Flink 1.3.2 and Kubernetes. Is this because of network congestion?

 

Thanks!

 

===== Logs from JM ======
2017-11-16 19:14:40,216 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]
2017-11-16 19:14:40,218 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager terminated.
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
  at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
  at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
  at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
  at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
  at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
  at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
  at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
  at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
  at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
  at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
  at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
  at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Try to restart or fail the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) if no longer possible.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state FAILING to RESTARTING.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f).
2017-11-16 19:14:40,229 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager fps-flink-taskmanager-701246518-rb4k6/10.225.131.3. Number of registered task managers 3. Number of available slots 3.
2017-11-16 19:14:40,301 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6: Name does not resolve]
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RESTARTING to CREATED.
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Restoring from latest valid checkpoint: Checkpoint 646 @ 1510859465870 for 3de8f35a1af689237e3c5c94023aba3f.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - No master state to restore
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state CREATED to RUNNING.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from CREATED to SCHEDULED.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from SCHEDULED to DEPLOYING.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (attempt #1) to fps-flink-taskmanager-701246518-v9vjx
2017-11-16 19:14:50,244 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6]
2017-11-16 19:14:50,987 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from DEPLOYING to RUNNING.
2017-11-16 19:15:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 649 @ 1510859724479
2017-11-16 19:15:25,638 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed checkpoint 649 (6724 bytes in 928 ms).
2017-11-16 19:16:22,205 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 
2017-11-16 19:16:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 648 @ 1510859810233
2017-11-16 19:17:12,972 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Received message for non-existing checkpoint 647
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at fps-flink-taskmanager-701246518-rb4k6/10.225.131.3 which was marked as dead earlier because of a heart-beat timeout.
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at fps-flink-taskmanager-701246518-rb4k6 (akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager) as 8905a72ca2655b9ed0bddb50ff3c49a9. Current number of registered hosts is 4. Current number of alive task slots is 4.
2017-11-16 19:17:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 650 @ 1510859844480
 
===== Logs from TM ======
2017-11-16 19:13:33,177 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, synchronous part) in thread Thread[Async calls on Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1),5,Flink Task Threads] took 1 ms.
2017-11-16 19:14:35,820 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-jobmanager:6123]
2017-11-16 19:14:37,525 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
2017-11-16 19:15:50,827 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Cancelling all computations and discarding all cached data.
2017-11-16 19:16:23,596 INFO  com.amazonaws.http.AmazonHttpClient                           - Unable to execute HTTP request: The target server failed to respond
org.apache.http.NoHttpResponseException: The target server failed to respond
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
  at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
  at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
  at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
  at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
  at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
  at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
  at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
  at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
  at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
  at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
  at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
  at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
  at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
  at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:1144)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:1133)
  at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:142)
  at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
  at org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream.close(HadoopDataOutputStream.java:48)
  at org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
  at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:319)
  at org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100)
  at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:270)
  at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:233)
  at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
  at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:906)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:748)
2017-11-16 19:16:35,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:16:35,819 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-jobmanager:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has a UID that has been quarantined. Association aborted.] 
2017-11-16 19:16:45,817 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
  at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1095)
  at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:311)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:120)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:16:55,815 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, asynchronous part) in thread Thread[pool-7-thread-647,5,Flink Task Threads] took 202637 ms.
2017-11-16 19:16:55,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Disassociating from JobManager
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.blob.BlobCache                       - Shutting down BlobCache
2017-11-16 19:17:13,677 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Successful registration at JobManager (akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager), starting network stack and library cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Determined BLOB server address to be fps-flink-jobmanager/10.231.37.240:6124. Starting BLOB cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-133033d7-a3c1-4ffb-b360-3d0efdc355aa
2017-11-16 19:17:31,097 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.util.regex.Pattern.matcher(Pattern.java:1093)
java.lang.String.replace(String.java:2239)
com.trueaccord.scalapb.textformat.TextFormatUtils$.escapeDoubleQuotesAndBackslashes(TextFormatUtils.scala:160)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:17:31,106 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 com.trueaccord.scalapb.textformat.TextGenerator.add(TextGenerator.scala:19)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:33)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:18:05,486 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.lang.String.replace(String.java:2240)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:14)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:13)
scala.collection.immutable.List.foreach(List.scala:381)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:13)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:18:34,375 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:18:36,257 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) [FAILED]
2017-11-16 19:18:36,618 ERROR org.apache.flink.runtime.taskmanager.TaskManager              - Cannot find task with ID 484ebabbd13dce5e8503d88005bcdb6c to unregister.

Reply | Threaded
Open this post in threaded view
|

Re: Task manager suddenly lost connection to JM

Hao Sun

Thanks! Let me try.


On Mon, Nov 20, 2017, 12:46 Stephan Ewen <[hidden email]> wrote:
We recently observed something like that in some tests and the problem was the following:

A Netty dependency pulled in via Hadoop or ZooKeeper conflicted with Akka's Netty dependency, which lead to remote connection failures.

In Flink 1.4, we fix that by shading Akka's Netty to make sure this cannot happen.

If that is in fact the problem you see, in Flink 1.3.2, you need to exclude Hadoop's Netty / ZooKeeper's Netty from the classpath.

Best,
Stephan


On Mon, Nov 20, 2017 at 6:02 AM, Chan, Regina <[hidden email]> wrote:

I have a similar problem where I lose Task Managers. I originally thought it had to do with memory issues but it doesn’t look that’s the case… Any ideas here? Am I missing something obvious?

 

11/19/2017 23:43:51     CHAIN DataSource (at createInput(ExecutionEnvironment.java:553) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> Filter (Filter at readParquetFile(FlinkRefinerOperations.java:899)) -> Map (Map at readParquetFile(FlinkRefinerOperations.java:902)) -> Map (Map at handleMilestoning(MergeTask.java:259))(1/1) switched to FAILED

java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='container_e5420_1511132977419_37521_01_000027'} @ d173636-336.dc.gs.com (dataPort=45003)

        at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)

        at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)

        at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)

        at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)

        at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)

 

 

Thanks,

Regina

 

From: Hao Sun [mailto:[hidden email]]
Sent: Thursday, November 16, 2017 5:37 PM
To: user
Subject: Task manager suddenly lost connection to JM

 

Hi team, I see an wired issue that one of my TM suddenly lost connection to JM.

Once the job running on the TM relocated to a new TM, it can reconnect to JM again.

And after a while, the new TM running the same job will repeat the same process.

It is not guaranteed the troubled TMs can reconnect to JM in a reasonable time frame, like minutes. Sometime it take days in order to reconnect successfully.

 

I am using Flink 1.3.2 and Kubernetes. Is this because of network congestion?

 

Thanks!

 

===== Logs from JM ======
2017-11-16 19:14:40,216 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]
2017-11-16 19:14:40,218 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Task manager akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager terminated.
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
  at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
  at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
  at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
  at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
  at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
  at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,219 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 50cae001c1d97e55889a6051319f4746 @ fps-flink-taskmanager-701246518-rb4k6 (dataPort=43871)
  at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
  at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
  at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
  at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
  at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
  at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1228)
  at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1131)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:125)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Try to restart or fail the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) if no longer possible.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state FAILING to RESTARTING.
2017-11-16 19:14:40,227 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting the job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f).
2017-11-16 19:14:40,229 INFO  org.apache.flink.runtime.instance.InstanceManager             - Unregistered task manager fps-flink-taskmanager-701246518-rb4k6/10.225.131.3. Number of registered task managers 3. Number of available slots 3.
2017-11-16 19:14:40,301 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6: Name does not resolve]
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state RESTARTING to CREATED.
2017-11-16 19:14:50,228 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Restoring from latest valid checkpoint: Checkpoint 646 @ 1510859465870 for 3de8f35a1af689237e3c5c94023aba3f.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - No master state to restore
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job KafkaDemo (env:production) (3de8f35a1af689237e3c5c94023aba3f) switched from state CREATED to RUNNING.
2017-11-16 19:14:50,233 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from CREATED to SCHEDULED.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from SCHEDULED to DEPLOYING.
2017-11-16 19:14:50,234 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (attempt #1) to fps-flink-taskmanager-701246518-v9vjx
2017-11-16 19:14:50,244 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]] Caused by: [fps-flink-taskmanager-701246518-rb4k6]
2017-11-16 19:14:50,987 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (6ae48dd06a2a48dce68a3ea83c21a462) switched from DEPLOYING to RUNNING.
2017-11-16 19:15:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 649 @ 1510859724479
2017-11-16 19:15:25,638 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed checkpoint 649 (6724 bytes in 928 ms).
2017-11-16 19:16:22,205 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 
2017-11-16 19:16:50,233 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 648 @ 1510859810233
2017-11-16 19:17:12,972 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Received message for non-existing checkpoint 647
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registering TaskManager at fps-flink-taskmanager-701246518-rb4k6/10.225.131.3 which was marked as dead earlier because of a heart-beat timeout.
2017-11-16 19:17:13,685 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at fps-flink-taskmanager-701246518-rb4k6 (akka.tcp://flink@fps-flink-taskmanager-701246518-rb4k6:46416/user/taskmanager) as 8905a72ca2655b9ed0bddb50ff3c49a9. Current number of registered hosts is 4. Current number of alive task slots is 4.
2017-11-16 19:17:24,480 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 650 @ 1510859844480
 
===== Logs from TM ======
2017-11-16 19:13:33,177 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, synchronous part) in thread Thread[Async calls on Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1),5,Flink Task Threads] took 1 ms.
2017-11-16 19:14:35,820 WARN  akka.remote.RemoteWatcher                                     - Detected unreachable: [akka.tcp://flink@fps-flink-jobmanager:6123]
2017-11-16 19:14:37,525 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
2017-11-16 19:15:50,827 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Cancelling all computations and discarding all cached data.
2017-11-16 19:16:23,596 INFO  com.amazonaws.http.AmazonHttpClient                           - Unable to execute HTTP request: The target server failed to respond
org.apache.http.NoHttpResponseException: The target server failed to respond
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
  at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
  at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
  at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
  at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
  at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
  at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
  at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
  at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
  at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
  at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
  at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
  at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
  at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
  at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:1144)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:1133)
  at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:142)
  at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
  at org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream.close(HadoopDataOutputStream.java:48)
  at org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
  at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:319)
  at org.apache.flink.runtime.checkpoint.AbstractAsyncSnapshotIOCallable.closeStreamAndGetStateHandle(AbstractAsyncSnapshotIOCallable.java:100)
  at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:270)
  at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:233)
  at org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
  at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:906)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:748)
2017-11-16 19:16:35,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:16:35,819 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-jobmanager:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has a UID that has been quarantined. Association aborted.] 
2017-11-16 19:16:45,817 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager akka://flink/user/taskmanager disconnects from JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager: JobManager is no longer reachable
  at org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:1095)
  at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:311)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
  at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
  at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
  at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
  at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:120)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:44)
  at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
  at akka.actor.ActorCell.invoke(ActorCell.scala:486)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2017-11-16 19:16:55,815 INFO  org.apache.flink.runtime.state.DefaultOperatorStateBackend    - DefaultOperatorStateBackend snapshot (File Stream Factory @ s3a://zendesk-usw2-fraud-prevention-production/checkpoints/3de8f35a1af689237e3c5c94023aba3f, asynchronous part) in thread Thread[pool-7-thread-647,5,Flink Task Threads] took 202637 ms.
2017-11-16 19:16:55,816 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Disassociating from JobManager
2017-11-16 19:17:00,825 INFO  org.apache.flink.runtime.blob.BlobCache                       - Shutting down BlobCache
2017-11-16 19:17:13,677 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Trying to register at JobManager akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds)
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Successful registration at JobManager (akka.tcp://flink@fps-flink-jobmanager:6123/user/jobmanager), starting network stack and library cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Determined BLOB server address to be fps-flink-jobmanager/10.231.37.240:6124. Starting BLOB cache.
2017-11-16 19:17:13,687 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-133033d7-a3c1-4ffb-b360-3d0efdc355aa
2017-11-16 19:17:31,097 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.util.regex.Pattern.matcher(Pattern.java:1093)
java.lang.String.replace(String.java:2239)
com.trueaccord.scalapb.textformat.TextFormatUtils$.escapeDoubleQuotesAndBackslashes(TextFormatUtils.scala:160)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:17:31,106 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 com.trueaccord.scalapb.textformat.TextGenerator.add(TextGenerator.scala:19)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:33)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
scala.collection.AbstractTraversable.addString(Traversable.scala:104)
scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
scala.collection.TraversableLike$class.toString(TraversableLike.scala:645)
scala.collection.SeqLike$class.toString(SeqLike.scala:682)
scala.collection.AbstractSeq.toString(Seq.scala:41)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:12)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:18:05,486 WARN  org.apache.flink.runtime.taskmanager.Task                     - Task 'Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1)' did not react to cancelling signal, but is stuck in method:
 java.lang.String.replace(String.java:2240)
com.trueaccord.scalapb.textformat.TextGenerator.addMaybeEscape(TextGenerator.scala:29)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:74)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:41)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$printField$1.apply(Printer.scala:25)
scala.collection.Iterator$class.foreach(Iterator.scala:742)
scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:25)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.printFieldValue(Printer.scala:70)
com.trueaccord.scalapb.textformat.Printer$.printSingleField(Printer.scala:37)
com.trueaccord.scalapb.textformat.Printer$.printField(Printer.scala:28)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:13)
com.trueaccord.scalapb.textformat.Printer$$anonfun$print$2.apply(Printer.scala:12)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:12)
com.trueaccord.scalapb.textformat.Printer$.print(Printer.scala:8)
com.trueaccord.scalapb.textformat.Printer$.printToString(Printer.scala:19)
com.trueaccord.scalapb.TextFormat$.printToUnicodeString(TextFormat.scala:34)
com.zendesk.fraud_prevention.data_types.MaxwellEvent.toString(MaxwellEvent.scala:212)
java.lang.String.valueOf(String.java:2994)
java.lang.StringBuilder.append(StringBuilder.java:131)
scala.StringContext.standardInterpolator(StringContext.scala:125)
scala.StringContext.s(StringContext.scala:95)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:14)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper$$anonfun$flatMap$1.apply(MaxwellFilterFlatMapper.scala:13)
scala.collection.immutable.List.foreach(List.scala:381)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:13)
com.zendesk.fraud_prevention.functions.MaxwellFilterFlatMapper.flatMap(MaxwellFilterFlatMapper.scala:10)
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:304)
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:393)
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:233)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.emitRecord(Kafka09Fetcher.java:193)
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:152)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:483)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
java.lang.Thread.run(Thread.java:748)
 
2017-11-16 19:18:34,375 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c).
2017-11-16 19:18:36,257 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermark(maxwell.tickets) -> MaxwellFPSEvent->InfluxDBData(maxwell.tickets) (1/1) (484ebabbd13dce5e8503d88005bcdb6c) [FAILED]
2017-11-16 19:18:36,618 ERROR org.apache.flink.runtime.taskmanager.TaskManager              - Cannot find task with ID 484ebabbd13dce5e8503d88005bcdb6c to unregister.


image001.png (40K) Download Attachment