I have 4 nodes for flink cluster using flink 1.3.2 and hadoop 2.6.5. All
configurations are same in 4 nodes.
Running yarn session and flink job, I killed node#2(flink-02)’s
TM(YarnTaskManager) process.
After resoucemanager make new TM container, job still failed status.
But when I killed JM(YarnApplicationMasterRunner) process, all behavior
seems ok. My job restarted well.
I’m using DFS for storageDir.
Flink configuration file and log files are attached. If any other
information needed, please let me know.
Here is use case, ------------------------------
-------------------------------------------------------------------------
1. start yarn session
flink-01 ~]$ flink/bin/yarn-session.sh -n 4 -s 8 -nm flink -d
2. submit job
flink-01 ~]$ flink/bin/flink run -d -p 3 -c a.b.c.Application abc.jar
3. find TM pid and kill it
flink-02 ~]$ jps | grep YarnTaskManager
4022 YarnTaskManager
flink-02 ~]$ kill 4022
And JM logs ------------------------------------------------------------
-------------------------------------------
2017-09-04 00:50:57,830 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Disassociated]
2017-09-04 00:50:58,013 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - Container
container_1504449399430_0005_01_000004 failed. Exit status: 143
2017-09-04 00:50:58,014 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - Diagnostics for container
container_1504449399430_0005_01_000004 in state COMPLETE : exitStatus=143
diagnostics=Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
2017-09-04 00:50:58,014 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - Total number of failed containers
so far: 1
2017-09-04 00:50:58,014 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - Received new container:
container_1504449399430_0005_01_000005 - Remaining pending container
requests: 0
2017-09-04 00:50:58,014 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - Launching TaskManager in
container ContainerInLaunch @ 1504453858014: Container: [ContainerId:
container_1504449399430_0005_01_000005, NodeId: flink-02:34821,
NodeHttpAddress: flink-02:8042, Resource: <memory:8192, vCores:1>,
Priority: 0, Token: Token { kind: ContainerToken, service: 10.1.0.5:34821
}, ] on host flink-02
2017-09-04 00:50:58,015 INFO org.apache.hadoop.yarn.client.api.impl.
ContainerManagementProtocolProxy - Opening proxy : flink-02:34821
2017-09-04 00:50:58,015 INFO org.apache.flink.yarn.
YarnJobManager - Task manager
akka.tcp://flink@flink-02:45104/user/taskmanager terminated.
2017-09-04 00:50:58,016 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Source: src -> Service: vps -> Sink:
snk (3/3) (33b33a9c15eb0107b320e375374ac07f) switched from RUNNING to
FAILED.
java.lang.Exception: TaskManager was lost/killed:
container_1504449399430_0005_01_000004 @ flink-02 (dataPort=42381)
at org.apache.flink.runtime.instance.SimpleSlot.
releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.
releaseSharedSlot(SlotSharingGroupAssignment.java:533)
at org.apache.flink.runtime.instance.SharedSlot.
releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(
Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.
unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$
apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(
JobManager.scala:1228)
at org.apache.flink.runtime.jobmanager.JobManager$$
anonfun$handleMessage$1.applyOrElse(JobManager.scala:474)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.clusterframework.
ContaineredJobManager$$anonfun$handleContainerMessage$1.applyOrElse(
ContaineredJobManager.scala:100)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.yarn.YarnJobManager$$anonfun$
handleYarnShutdown$1.applyOrElse(YarnJobManager.scala:103)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$
anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.
scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.
applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.flink.runtime.jobmanager.JobManager.
aroundReceive(JobManager.scala:125)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)
2017-09-04 00:50:58,018 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Job vp (
6e73f6babf36c9321efc0524a82dede1) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed:
container_1504449399430_0005_01_000004 @ flink-02 (dataPort=42381)
at org.apache.flink.runtime.instance.SimpleSlot.
releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.
releaseSharedSlot(SlotSharingGroupAssignment.java:533)
at org.apache.flink.runtime.instance.SharedSlot.
releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(
Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.
unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$
apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(
JobManager.scala:1228)
at org.apache.flink.runtime.jobmanager.JobManager$$
anonfun$handleMessage$1.applyOrElse(JobManager.scala:474)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.clusterframework.
ContaineredJobManager$$anonfun$handleContainerMessage$1.applyOrElse(
ContaineredJobManager.scala:100)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.yarn.YarnJobManager$$anonfun$
handleYarnShutdown$1.applyOrElse(YarnJobManager.scala:103)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$
anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.
scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.
applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.flink.runtime.jobmanager.JobManager.
aroundReceive(JobManager.scala:125)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)
2017-09-04 00:50:58,027 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Source: src -> Service: vps -> Sink:
snk (1/3) (02232baaa7b3f88982cbf40cb5ebe489) switched from RUNNING to
CANCELING.
2017-09-04 00:50:58,029 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Source: src -> Service: vps -> Sink:
snk (2/3) (952c34a277df34aa7e3216a8b96ae2d5) switched from RUNNING to
CANCELING.
2017-09-04 00:50:58,029 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - Requesting new TaskManager
container with 8192 megabytes memory. Pending requests: 1
2017-09-04 00:50:58,030 INFO org.apache.flink.runtime.
instance.InstanceManager - Unregistered task manager flink-02/
10.1.0.5. Number of registered task managers 2. Number of available slots
16.
2017-09-04 00:50:58,065 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Source: src -> Service: vps -> Sink:
snk (1/3) (02232baaa7b3f88982cbf40cb5ebe489) switched from CANCELING to
CANCELED.
2017-09-04 00:50:58,071 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Source: src -> Service: vps -> Sink:
snk (2/3) (952c34a277df34aa7e3216a8b96ae2d5) switched from CANCELING to
CANCELED.
2017-09-04 00:50:58,072 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Try to restart or fail the job vp (
6e73f6babf36c9321efc0524a82dede1) if no longer possible.
2017-09-04 00:50:58,072 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Job vp (
6e73f6babf36c9321efc0524a82dede1) switched from state FAILING to FAILED.
java.lang.Exception: TaskManager was lost/killed:
container_1504449399430_0005_01_000004 @ flink-02 (dataPort=42381)
at org.apache.flink.runtime.instance.SimpleSlot.
releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.
releaseSharedSlot(SlotSharingGroupAssignment.java:533)
at org.apache.flink.runtime.instance.SharedSlot.
releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(
Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.
unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$
apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(
JobManager.scala:1228)
at org.apache.flink.runtime.jobmanager.JobManager$$
anonfun$handleMessage$1.applyOrElse(JobManager.scala:474)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.clusterframework.
ContaineredJobManager$$anonfun$handleContainerMessage$1.applyOrElse(
ContaineredJobManager.scala:100)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.yarn.YarnJobManager$$anonfun$
handleYarnShutdown$1.applyOrElse(YarnJobManager.scala:103)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$
anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.
scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.
applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.flink.runtime.jobmanager.JobManager.
aroundReceive(JobManager.scala:125)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)
2017-09-04 00:50:58,072 INFO org.apache.flink.runtime.
executiongraph.ExecutionGraph - Could not restart the job vp (
6e73f6babf36c9321efc0524a82dede1) because the restart strategy prevented it.
java.lang.Exception: TaskManager was lost/killed:
container_1504449399430_0005_01_000004 @ flink-02 (dataPort=42381)
at org.apache.flink.runtime.instance.SimpleSlot.
releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.
releaseSharedSlot(SlotSharingGroupAssignment.java:533)
at org.apache.flink.runtime.instance.SharedSlot.
releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(
Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.
unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$
apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(
JobManager.scala:1228)
at org.apache.flink.runtime.jobmanager.JobManager$$
anonfun$handleMessage$1.applyOrElse(JobManager.scala:474)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.clusterframework.
ContaineredJobManager$$anonfun$handleContainerMessage$1.applyOrElse(
ContaineredJobManager.scala:100)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.yarn.YarnJobManager$$anonfun$
handleYarnShutdown$1.applyOrElse(YarnJobManager.scala:103)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$
anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38)
at scala.runtime.AbstractPartialFunction.apply(
AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(
LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.
scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.
applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.flink.runtime.jobmanager.JobManager.
aroundReceive(JobManager.scala:125)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)
2017-09-04 00:50:58,073 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator
- Stopping checkpoint coordinator for job 6e73f6babf36c9321efc0524a82dede1
2017-09-04 00:50:58,073 INFO org.apache.flink.runtime.checkpoint.
ZooKeeperCompletedCheckpointStore - Shutting down
2017-09-04 00:50:58,073 INFO org.apache.flink.runtime.checkpoint.
ZooKeeperCompletedCheckpointStore - Removing /flink/cluster_one/
checkpoints/6e73f6babf36c9321efc0524a82dede1 from ZooKeeper
2017-09-04 00:50:58,102 INFO
org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter
- Shutting down.
2017-09-04 00:50:58,102 INFO
org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter
- Removing /checkpoint-counter/6e73f6babf36c9321efc0524a82dede1 from
ZooKeeper
2017-09-04 00:50:58,118 INFO org.apache.flink.runtime.jobmanager.
ZooKeeperSubmittedJobGraphStore - Removed job graph
6e73f6babf36c9321efc0524a82dede1 from ZooKeeper.
2017-09-04 00:51:01,000 INFO org.apache.flink.runtime.
instance.InstanceManager - Registered TaskManager at flink-02
(akka.tcp://flink@flink-02:37281/user/taskmanager) as
6f8a6941cddb361de1643fa47b71a1db. Current number of registered hosts is 3.
Current number of alive task slots is 24.
2017-09-04 00:51:01,000 INFO org.apache.flink.yarn.
YarnFlinkResourceManager - TaskManager
container_1504449399430_0005_01_000005 has started.
2017-09-04 00:51:02,868 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:07,883 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:12,903 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:17,924 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:22,942 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:27,963 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:32,983 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system
[akka.tcp://flink@flink-02:45104] has failed, address is now gated for
[5000] ms. Reason: [Association failed with [akka.tcp://flink@flink-02:45104]]
Caused by: [Connection refused: flink-02/10.1.0.5:45104]
2017-09-04 00:51:38,004 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:43,024 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:48,043 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:53,063 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:51:58,082 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:03,105 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:08,123 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:13,143 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:18,164 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:23,184 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:28,202 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:33,225 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:38,244 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104]
has failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:43,263 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:48,282 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:53,302 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:52:58,324 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:03,344 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:08,362 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:13,384 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:18,403 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:23,422 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:28,443 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:33,462 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:38,482 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:43,502 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:48,522 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:53,549 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:53:58,573 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink@flink-02:45104] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@flink-02:45104]] Caused by: [Connection refused:
flink-02/10.1.0.5:45104]
2017-09-04 00:54:03,592 ERROR Remoting
- Association to [akka.tcp://flink@flink-02:45104] with
UID [1818461746] irrecoverably failed. Quarantining address.
java.util.concurrent.TimeoutException: Delivery of system messages timed
out and they were dropped.
at akka.remote.ReliableDeliverySupervisor$$
anonfun$gated$1.applyOrElse(Endpoint.scala:336)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.remote.ReliableDeliverySupervisor.
aroundReceive(Endpoint.scala:189)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$
AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(
ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107)
ᐧ
ᐧ
jobmanager.log (104K) <
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/attachment/15351/0/jobmanager.log>flink-conf.yaml (10K) <
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/attachment/15351/1/flink-conf.yaml>killed-taskmanager.log (74K) <
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/attachment/15351/2/killed-taskmanager.log>new-taskmanager.log (52K) <
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/attachment/15351/3/new-taskmanager.log>