Temporary failure in name resolution

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Temporary failure in name resolution

miki haiat
i tried to run flink on kubernetes and  as stand alone HA cluster and on both cases  task manger got lost/kill after few hours/days    .
im using ubuntu and flink 1.4.2 .


this is part of the log , i also attaches the full log .

org.tlv.esb.StreamingJob$EsbTraceEvictor@20ffca60, WindowedStream.apply(WindowedStream.java:1061)) -> Sink: Unnamed (1/1) (91b27853aa30be93322d9c516ec266bf) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,727 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming esb correlate msg (0db04ff29124f59a123d4743d89473ed) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,737 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (a10c25c2d3de57d33828524938fcfcc2) switched from RUNNING to CANCELING.




log_flink.log (227K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Temporary failure in name resolution

Timo Walther
Hi Miki,

for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation happens too late such that the state becomes too big.

Regards,
Timo


Am 03.04.18 um 08:26 schrieb miki haiat:
i tried to run flink on kubernetes and  as stand alone HA cluster and on both cases  task manger got lost/kill after few hours/days    .
im using ubuntu and flink 1.4.2 .


this is part of the log , i also attaches the full log .

org.tlv.esb.StreamingJob$EsbTraceEvictor@20ffca60, WindowedStream.apply(WindowedStream.java:1061)) -> Sink: Unnamed (1/1) (91b27853aa30be93322d9c516ec266bf) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,727 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming esb correlate msg (0db04ff29124f59a123d4743d89473ed) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,737 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (a10c25c2d3de57d33828524938fcfcc2) switched from RUNNING to CANCELING.




Reply | Threaded
Open this post in threaded view
|

Re: Temporary failure in name resolution

Hao Sun
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way to monitor JVM status? If through the monitor metrics, what metric I should look after?
We are running Flink on K8S. Is there a possibility that a job consumes too much network bandwidth, so JM and TM can not connect?

On Tue, Apr 3, 2018 at 3:11 AM Timo Walther <[hidden email]> wrote:
Hi Miki,

for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation happens too late such that the state becomes too big.

Regards,
Timo


Am 03.04.18 um 08:26 schrieb miki haiat:
i tried to run flink on kubernetes and  as stand alone HA cluster and on both cases  task manger got lost/kill after few hours/days    .
im using ubuntu and flink 1.4.2 .


this is part of the log , i also attaches the full log .

org.tlv.esb.StreamingJob$EsbTraceEvictor@20ffca60, WindowedStream.apply(WindowedStream.java:1061)) -> Sink: Unnamed (1/1) (91b27853aa30be93322d9c516ec266bf) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,727 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming esb correlate msg (0db04ff29124f59a123d4743d89473ed) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,737 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (a10c25c2d3de57d33828524938fcfcc2) switched from RUNNING to CANCELING.




Reply | Threaded
Open this post in threaded view
|

Re: Temporary failure in name resolution

miki haiat
HI , 

i checked the code again the figure out where the problem  can be

i just wondered if im implementing the Evictor correctly  ?

full code 




public static class EsbTraceEvictor implements Evictor<EsbTrace, GlobalWindow> {
org.slf4j.Logger LOG = LoggerFactory.getLogger(EsbTraceEvictor.class);
@Override
public void evictBefore(Iterable<TimestampedValue<EsbTrace>> iterable, int i, GlobalWindow globalWindow, Evictor.EvictorContext evictorContext) {

}

@Override
public void evictAfter(Iterable<TimestampedValue<EsbTrace>> elements, int i, GlobalWindow globalWindow, EvictorContext evictorContext) {
//change it to current procces time
long min5min = LocalDateTime.now().minusMinutes(5).getNano();
LOG.info("time now -5min",min5min);
DateTimeFormatter format = DateTimeFormatter.ISO_DATE_TIME;
for (Iterator<TimestampedValue<EsbTrace>> iterator = elements.iterator(); iterator.hasNext(); ) {
TimestampedValue<EsbTrace> element = iterator.next();
LocalDateTime el = LocalDateTime.parse(element.getValue().getEndDate(),format);
LOG.info("element time ",element.getValue().getEndDate());
if (el.minusMinutes(5).getNano() <= min5min) {
iterator.remove();
}
}
}
}





On Tue, Apr 3, 2018 at 4:28 PM, Hao Sun <[hidden email]> wrote:
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way to monitor JVM status? If through the monitor metrics, what metric I should look after?
We are running Flink on K8S. Is there a possibility that a job consumes too much network bandwidth, so JM and TM can not connect?

On Tue, Apr 3, 2018 at 3:11 AM Timo Walther <[hidden email]> wrote:
Hi Miki,

for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation happens too late such that the state becomes too big.

Regards,
Timo


Am 03.04.18 um 08:26 schrieb miki haiat:
i tried to run flink on kubernetes and  as stand alone HA cluster and on both cases  task manger got lost/kill after few hours/days    .
im using ubuntu and flink 1.4.2 .


this is part of the log , i also attaches the full log .

org.tlv.esb.StreamingJob$EsbTraceEvictor@20ffca60, WindowedStream.apply(WindowedStream.java:1061)) -> Sink: Unnamed (1/1) (91b27853aa30be93322d9c516ec266bf) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,727 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming esb correlate msg (0db04ff29124f59a123d4743d89473ed) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,737 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (a10c25c2d3de57d33828524938fcfcc2) switched from RUNNING to CANCELING.





Reply | Threaded
Open this post in threaded view
|

Re: Temporary failure in name resolution

Fabian Hueske-2
Hi,

The issue might be related to garbage collection pauses during which the TM JVM cannot communicate with the JM.
The metrics contain a stats for memory consumpion [1] and GC activity [2] that can help to diagnose the problem.


2018-04-04 8:30 GMT+02:00 miki haiat <[hidden email]>:
HI , 

i checked the code again the figure out where the problem  can be

i just wondered if im implementing the Evictor correctly  ?

full code 




public static class EsbTraceEvictor implements Evictor<EsbTrace, GlobalWindow> {
org.slf4j.Logger LOG = LoggerFactory.getLogger(EsbTraceEvictor.class);
@Override
public void evictBefore(Iterable<TimestampedValue<EsbTrace>> iterable, int i, GlobalWindow globalWindow, Evictor.EvictorContext evictorContext) {

}

@Override
public void evictAfter(Iterable<TimestampedValue<EsbTrace>> elements, int i, GlobalWindow globalWindow, EvictorContext evictorContext) {
//change it to current procces time
long min5min = LocalDateTime.now().minusMinutes(5).getNano();
LOG.info("time now -5min",min5min);
DateTimeFormatter format = DateTimeFormatter.ISO_DATE_TIME;
for (Iterator<TimestampedValue<EsbTrace>> iterator = elements.iterator(); iterator.hasNext(); ) {
TimestampedValue<EsbTrace> element = iterator.next();
LocalDateTime el = LocalDateTime.parse(element.getValue().getEndDate(),format);
LOG.info("element time ",element.getValue().getEndDate());
if (el.minusMinutes(5).getNano() <= min5min) {
iterator.remove();
}
}
}
}





On Tue, Apr 3, 2018 at 4:28 PM, Hao Sun <[hidden email]> wrote:
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way to monitor JVM status? If through the monitor metrics, what metric I should look after?
We are running Flink on K8S. Is there a possibility that a job consumes too much network bandwidth, so JM and TM can not connect?

On Tue, Apr 3, 2018 at 3:11 AM Timo Walther <[hidden email]> wrote:
Hi Miki,

for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation happens too late such that the state becomes too big.

Regards,
Timo


Am 03.04.18 um 08:26 schrieb miki haiat:
i tried to run flink on kubernetes and  as stand alone HA cluster and on both cases  task manger got lost/kill after few hours/days    .
im using ubuntu and flink 1.4.2 .


this is part of the log , i also attaches the full log .

org.tlv.esb.StreamingJob$EsbTraceEvictor@20ffca60, WindowedStream.apply(WindowedStream.java:1061)) -> Sink: Unnamed (1/1) (91b27853aa30be93322d9c516ec266bf) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,727 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming esb correlate msg (0db04ff29124f59a123d4743d89473ed) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,737 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (a10c25c2d3de57d33828524938fcfcc2) switched from RUNNING to CANCELING.






Reply | Threaded
Open this post in threaded view
|

Re: Temporary failure in name resolution

miki haiat
i attached the full logs from JM and TM

Memory and GC looks fine , 

not sure really what causing  the KM/TM to crash ...  


​​

 

On Wed, Apr 4, 2018 at 4:30 PM, Fabian Hueske <[hidden email]> wrote:
Hi,

The issue might be related to garbage collection pauses during which the TM JVM cannot communicate with the JM.
The metrics contain a stats for memory consumpion [1] and GC activity [2] that can help to diagnose the problem.


2018-04-04 8:30 GMT+02:00 miki haiat <[hidden email]>:
HI , 

i checked the code again the figure out where the problem  can be

i just wondered if im implementing the Evictor correctly  ?

full code 




public static class EsbTraceEvictor implements Evictor<EsbTrace, GlobalWindow> {
org.slf4j.Logger LOG = LoggerFactory.getLogger(EsbTraceEvictor.class);
@Override
public void evictBefore(Iterable<TimestampedValue<EsbTrace>> iterable, int i, GlobalWindow globalWindow, Evictor.EvictorContext evictorContext) {

}

@Override
public void evictAfter(Iterable<TimestampedValue<EsbTrace>> elements, int i, GlobalWindow globalWindow, EvictorContext evictorContext) {
//change it to current procces time
long min5min = LocalDateTime.now().minusMinutes(5).getNano();
LOG.info("time now -5min",min5min);
DateTimeFormatter format = DateTimeFormatter.ISO_DATE_TIME;
for (Iterator<TimestampedValue<EsbTrace>> iterator = elements.iterator(); iterator.hasNext(); ) {
TimestampedValue<EsbTrace> element = iterator.next();
LocalDateTime el = LocalDateTime.parse(element.getValue().getEndDate(),format);
LOG.info("element time ",element.getValue().getEndDate());
if (el.minusMinutes(5).getNano() <= min5min) {
iterator.remove();
}
}
}
}





On Tue, Apr 3, 2018 at 4:28 PM, Hao Sun <[hidden email]> wrote:
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way to monitor JVM status? If through the monitor metrics, what metric I should look after?
We are running Flink on K8S. Is there a possibility that a job consumes too much network bandwidth, so JM and TM can not connect?

On Tue, Apr 3, 2018 at 3:11 AM Timo Walther <[hidden email]> wrote:
Hi Miki,

for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation happens too late such that the state becomes too big.

Regards,
Timo


Am 03.04.18 um 08:26 schrieb miki haiat:
i tried to run flink on kubernetes and  as stand alone HA cluster and on both cases  task manger got lost/kill after few hours/days    .
im using ubuntu and flink 1.4.2 .


this is part of the log , i also attaches the full log .

org.tlv.esb.StreamingJob$EsbTraceEvictor@20ffca60, WindowedStream.apply(WindowedStream.java:1061)) -> Sink: Unnamed (1/1) (91b27853aa30be93322d9c516ec266bf) switched from RUNNING to FAILED.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,727 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming esb correlate msg (0db04ff29124f59a123d4743d89473ed) switched from state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed: 6dc6cd5c15588b49da39a31b6480b2e3 @ beam2 (dataPort=42587)
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:523)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:212)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1198)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:1096)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:49)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:374)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:511)
at akka.actor.ActorCell.invoke(ActorCell.scala:494)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-04-02 13:09:01,737 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (a10c25c2d3de57d33828524938fcfcc2) switched from RUNNING to CANCELING.








flink.log (5M) Download Attachment