Hello,
I use the brand new 0.10 version and I have problems running a streaming execution. My topology is linear : a custom source SC scans a directory
and emits hdfs file names ; a first mapper M1 opens the file and emits its lines ; a filter F filters lines ; another mapper M2 transforms them ; and a mapper/sink M3->SK stores them in HDFS.
SC->M1->F->M2->M3->SK
The M2 transformer uses a bit of RAM because when it opens it loads a 11M row static table inside a hash map to enrich the lines. I use 55 slots
on Yarn, using 11 containers of 12Gb x 5 slots
To my understanding, I should not have any memory problem since each record is independent : no join, no key, no aggregation, no window => it’s a
simple flow mapper, with RAM simply used as a buffer. However, if I submit enough input data, I systematically crash my app with “Connection unexpectedly closed by remote task manager” exception, and the first error in YARN log shows that “a container is running
beyond physical memory limits”.
If I increase the container size, I simply need to feed in more data to get the crash happen.
Any idea?
Greetings,
Arnaud
_________________________________
Exceptions in Flink dashboard detail :
Root Exception :
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'bt1shli6/172.21.125.31:33186'.
This might indicate that the remote task manager was lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:744)
Workers exception :
java.lang.Exception: The slot in which the task was executed has been released. Probably loss of TaskManager afe405e5c982a0317dd5b98108a33eef @ h1r2dn09
- 5 slots - URL: akka.tcp://flink@172.21.125.31:42012/user/taskmanager
at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151)
at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:152)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
at akka.actor.ActorCell.invoke(ActorCell.scala:486)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Yarn log extract :
13:46:18,346 INFO org.apache.flink.yarn.ApplicationMaster - YARN daemon runs as yarn setting user to execute
Flink ApplicationMaster/JobManager to datcrypt
13:46:18,351 INFO org.apache.flink.yarn.ApplicationMaster - --------------------------------------------------------------------------------
13:46:18,351 INFO org.apache.flink.yarn.ApplicationMaster - Starting YARN ApplicationMaster/JobManager (Version:
0.10.0, Rev:ab2cca4, Date:10.11.2015 @ 13:50:14 UTC)
13:46:18,351 INFO org.apache.flink.yarn.ApplicationMaster - Current user: yarn
13:46:18,351 INFO org.apache.flink.yarn.ApplicationMaster - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle
Corporation - 1.7/24.45-b08
13:46:18,351 INFO org.apache.flink.yarn.ApplicationMaster - Maximum heap size: 1388 MiBytes
13:46:18,351 INFO org.apache.flink.yarn.ApplicationMaster - JAVA_HOME: /usr/java/default
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - Hadoop version: 2.6.0
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - JVM Options:
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - -Xmx1448M
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - -Dlog.file=/data/7/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000001/jobmanager.log
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - -Dlogback.configurationFile=file:logback.xml
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - -Dlog4j.configuration=file:log4j.properties
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - Program Arguments: (none)
13:46:18,353 INFO org.apache.flink.yarn.ApplicationMaster - --------------------------------------------------------------------------------
13:46:18,355 INFO org.apache.flink.yarn.ApplicationMaster - registered UNIX signal handlers for [TERM, HUP,
INT]
13:46:18,375 INFO org.apache.flink.yarn.ApplicationMaster - Starting ApplicationMaster/JobManager in streaming
mode
13:46:18,376 INFO org.apache.flink.yarn.ApplicationMaster - Loading config from: /data/10/hadoop/yarn/local/usercache/datcrypt/appcache/application_1444907929501_7416/container_e05_1444907929501_7416_01_000001.
13:46:18,416 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager
13:46:18,421 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor system at 172.21.125.28:0
13:46:19,022 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
13:46:19,081 INFO Remoting - Starting remoting
13:46:19,248 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@172.21.125.28:50891]
13:46:19,256 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManger web frontend
13:46:19,283 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /tmp/flink-web-cfd99c17-62ed-4bc8-b2d5-9fde1248ee75
for the web interface files
13:46:19,284 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Serving job manager log from /data/7/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000001/jobmanager.log
13:46:19,284 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Serving job manager stdout from /data/7/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000001/jobmanager.out
13:46:19,614 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Web frontend listening at 0:0:0:0:0:0:0:0:55890
13:46:19,616 INFO org.apache.flink.runtime.jobmanager.JobManager - Starting JobManager actor
13:46:19,623 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-1dc260e8-ff31-4b07-9bf6-8630504fd2c6
13:46:19,624 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:35970 - max concurrent
requests: 50 - max backlog: 1000
13:46:19,643 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Starting with JobManager akka.tcp://flink@172.21.125.28:50891/user/jobmanager
on port 55890
13:46:19,643 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@172.21.125.28:50891/user/jobmanager:null.
13:46:19,644 INFO org.apache.flink.runtime.jobmanager.MemoryArchivist - Started memory archivist akka://flink/user/archive
13:46:19,644 INFO org.apache.flink.yarn.YarnJobManager - Starting JobManager at akka.tcp://flink@172.21.125.28:50891/user/jobmanager.
13:46:19,652 INFO org.apache.flink.yarn.ApplicationMaster - Generate configuration file for application master.
13:46:19,656 INFO org.apache.flink.yarn.YarnJobManager - JobManager akka.tcp://flink@172.21.125.28:50891/user/jobmanager
was granted leadership with leader session ID None.
13:46:19,668 INFO org.apache.flink.yarn.ApplicationMaster - Starting YARN session on Job Manager.
13:46:19,668 INFO org.apache.flink.yarn.ApplicationMaster - Application Master properly initiated. Awaiting
termination of actor system.
13:46:19,680 INFO org.apache.flink.yarn.YarnJobManager - Start yarn session.
13:46:19,762 INFO org.apache.flink.yarn.YarnJobManager - Yarn session with 11 TaskManagers. Tolerating
11 failed TaskManagers
13:46:20,471 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at h1r1nn01.bpa.bouyguestelecom.fr/172.21.125.3:8030
13:46:20,502 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - yarn.client.max-cached-nodemanagers-proxies
: 0
13:46:20,503 INFO org.apache.flink.yarn.YarnJobManager - Registering ApplicationMaster with tracking url
http://h1r2dn06.bpa.bouyguestelecom.fr:55890.
13:46:20,675 INFO org.apache.flink.yarn.YarnJobManager - Retrieved 0 TaskManagers from previous attempts.
13:46:20,680 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 0.
13:46:20,685 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 1.
13:46:20,685 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 2.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 3.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 4.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 5.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 6.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 7.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 8.
13:46:20,686 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 9.
13:46:20,687 INFO org.apache.flink.yarn.YarnJobManager - Requesting initial TaskManager container 10.
13:46:20,737 INFO org.apache.flink.yarn.Utils - Copying from file:/data/10/hadoop/yarn/local/usercache/datcrypt/appcache/application_1444907929501_7416/container_e05_1444907929501_7416_01_000001/flink-conf-modified.yaml
to hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/datcrypt/.flink/application_1444907929501_7416/flink-conf-modified.yaml
13:46:20,983 INFO org.apache.flink.yarn.YarnJobManager - Prepared local resource for modified yaml: resource
{ scheme: "hdfs" host: "h1r1nn01.bpa.bouyguestelecom.fr" port: 8020 file: "/user/datcrypt/.flink/application_1444907929501_7416/flink-conf-modified.yaml" } size: 5209 timestamp: 1447418780914 type: FILE visibility: APPLICATION
13:46:20,990 INFO org.apache.flink.yarn.YarnJobManager - Create container launch context.
13:46:20,999 INFO org.apache.flink.yarn.YarnJobManager - Starting TM with command=$JAVA_HOME/bin/java
-Xms9216m -Xmx9216m -XX:MaxDirectMemorySize=9216m -Dlog.file="<LOG_DIR>/taskmanager.log" -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1> <LOG_DIR>/taskmanager.out
2> <LOG_DIR>/taskmanager.err --streamingMode streaming
13:46:21,551 INFO org.apache.flink.yarn.YarnJobManager - The user requested 11 containers, 0 running.
11 containers missing
13:46:22,085 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn11.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn08.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn06.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn04.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn02.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn03.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn08.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn09.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn12.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn07.bpa.bouyguestelecom.fr:45454
13:46:22,086 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn11.bpa.bouyguestelecom.fr:45454
13:46:22,102 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000002
13:46:22,102 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000003
13:46:22,102 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000004
13:46:22,102 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000005
13:46:22,102 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000006
13:46:22,102 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000007
13:46:22,103 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000008
13:46:22,103 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000009
13:46:22,103 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000010
13:46:22,103 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000011
13:46:22,103 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000012
13:46:22,103 INFO org.apache.flink.yarn.YarnJobManager - The user requested 11 containers, 0 running.
11 containers missing
13:46:22,104 INFO org.apache.flink.yarn.YarnJobManager - 11 containers already allocated by YARN. Starting...
13:46:22,106 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r1dn11.bpa.bouyguestelecom.fr:45454
13:46:22,151 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000002
on host h1r1dn11.bpa.bouyguestelecom.fr).
13:46:22,152 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn08.bpa.bouyguestelecom.fr:45454
13:46:22,162 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000003
on host h1r2dn08.bpa.bouyguestelecom.fr).
13:46:22,162 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn06.bpa.bouyguestelecom.fr:45454
13:46:22,171 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000004
on host h1r2dn06.bpa.bouyguestelecom.fr).
13:46:22,172 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r1dn04.bpa.bouyguestelecom.fr:45454
13:46:22,182 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000005
on host h1r1dn04.bpa.bouyguestelecom.fr).
13:46:22,183 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r1dn02.bpa.bouyguestelecom.fr:45454
13:46:22,191 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000006
on host h1r1dn02.bpa.bouyguestelecom.fr).
13:46:22,192 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn03.bpa.bouyguestelecom.fr:45454
13:46:22,202 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000007
on host h1r2dn03.bpa.bouyguestelecom.fr).
13:46:22,202 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r1dn08.bpa.bouyguestelecom.fr:45454
13:46:22,257 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000008
on host h1r1dn08.bpa.bouyguestelecom.fr).
13:46:22,258 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn09.bpa.bouyguestelecom.fr:45454
13:46:22,267 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000009
on host h1r2dn09.bpa.bouyguestelecom.fr).
13:46:22,268 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r1dn12.bpa.bouyguestelecom.fr:45454
13:46:22,279 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000010
on host h1r1dn12.bpa.bouyguestelecom.fr).
13:46:22,280 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn07.bpa.bouyguestelecom.fr:45454
13:46:22,290 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000011
on host h1r2dn07.bpa.bouyguestelecom.fr).
13:46:22,290 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn11.bpa.bouyguestelecom.fr:45454
13:46:22,298 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000012
on host h1r2dn11.bpa.bouyguestelecom.fr).
13:46:26,430 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn06 (akka.tcp://flink@172.21.125.28:36435/user/taskmanager)
as ec3fba3dc109840893fdefadf9d25769. Current number of registered hosts is 1. Current number of alive task slots is 5.
13:46:26,570 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r1dn08 (akka.tcp://flink@172.21.125.18:49837/user/taskmanager)
as d118671119071c18654fd99ce0ef2b26. Current number of registered hosts is 2. Current number of alive task slots is 10.
13:46:26,580 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn11 (akka.tcp://flink@172.21.125.33:45822/user/taskmanager)
as f90850c024cdcf76f97d6a6f80aa656b. Current number of registered hosts is 3. Current number of alive task slots is 15.
13:46:26,650 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn08 (akka.tcp://flink@172.21.125.30:52119/user/taskmanager)
as 61397a13eccdddb37ff71e28b6655c3c. Current number of registered hosts is 4. Current number of alive task slots is 20.
13:46:26,819 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r1dn12 (akka.tcp://flink@172.21.125.22:42508/user/taskmanager)
as 2a1c431c9c5727a727c685eb2bbf2ab0. Current number of registered hosts is 5. Current number of alive task slots is 25.
13:46:26,874 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn07 (akka.tcp://flink@172.21.125.29:43648/user/taskmanager)
as 46960d7a66ba385f359a4fdc86fae507. Current number of registered hosts is 6. Current number of alive task slots is 30.
13:46:26,913 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn09 (akka.tcp://flink@172.21.125.31:42048/user/taskmanager)
as 44bf60d7f6bc695548b06953ce8c7a01. Current number of registered hosts is 7. Current number of alive task slots is 35.
13:46:27,083 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r1dn02 (akka.tcp://flink@172.21.125.12:46609/user/taskmanager)
as d4c153b853ad95e16b3d9f311381d56d. Current number of registered hosts is 8. Current number of alive task slots is 40.
13:46:27,161 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r1dn04 (akka.tcp://flink@172.21.125.14:58610/user/taskmanager)
as 051a3c73e96111f0ab04608583c317ff. Current number of registered hosts is 9. Current number of alive task slots is 45.
13:46:27,496 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r1dn11 (akka.tcp://flink@172.21.125.21:36403/user/taskmanager)
as 098dee828632d7f3ddc2877e997df866. Current number of registered hosts is 10. Current number of alive task slots is 50.
13:46:29,159 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn03 (akka.tcp://flink@172.21.125.25:58890/user/taskmanager)
as bb8607fb775d2219ed384c7f15bda19e. Current number of registered hosts is 11. Current number of alive task slots is 55.
13:49:16,869 INFO org.apache.flink.yarn.YarnJobManager - Submitting job 0348520c75b5c18c9833d8eb4732c009
(KUBERA-GEO-SOURCE2BRUT).
13:49:16,873 INFO org.apache.flink.yarn.YarnJobManager - Scheduling job 0348520c75b5c18c9833d8eb4732c009
(KUBERA-GEO-SOURCE2BRUT).
13:49:16,873 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Scan /PHOCUS2/DEV/UBV/IN (1/1) (9aa2539432a19f5b401b92ebcc32ab4e)
switched from CREATED to SCHEDULED
13:49:16,873 INFO org.apache.flink.yarn.YarnJobManager - Status of job 0348520c75b5c18c9833d8eb4732c009
(KUBERA-GEO-SOURCE2BRUT) changed to RUNNING.
13:49:16,874 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Scan /PHOCUS2/DEV/UBV/IN (1/1) (9aa2539432a19f5b401b92ebcc32ab4e)
switched from SCHEDULED to DEPLOYING
13:49:16,874 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Scan /PHOCUS2/DEV/UBV/IN (1/1)
(attempt #0) to h1r2dn09
13:49:16,874 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink:
Unnamed (1/50) (764f26e2ae4f5d68e7881928dee8eea8) switched from CREATED to SCHEDULED
13:49:16,875 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink:
Unnamed (1/50) (764f26e2ae4f5d68e7881928dee8eea8) switched from SCHEDULED to DEPLOYING
13:49:16,875 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Flat Map -> Filter -> Flat Map -> Map
-> Sink: Unnamed (1/50) (attempt #0) to h1r2dn09
13:49:16,876 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink:
Unnamed (2/50) (f9ca3935b76ef8376a55c31b57e38388) switched from CREATED to SCHEDULED
13:49:16,876 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink:
Unnamed (2/50) (f9ca3935b76ef8376a55c31b57e38388) switched from SCHEDULED to DEPLOYING
(…)
(The apps runs well)
(…)
14:18:27,692 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@172.21.125.31:42048] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
14:18:29,987 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000009 is completed with diagnostics: Container [pid=44905,containerID=container_e05_1444907929501_7416_01_000009] is running beyond physical memory limits. Current usage: 12.1 GB of 12 GB physical memory used; 15.9 GB of 25.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e05_1444907929501_7416_01_000009 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 44905 44903 44905 44905 (bash) 0 0 108650496 311 /bin/bash -c /usr/java/default/bin/java -Xms9216m -Xmx9216m -XX:MaxDirectMemorySize=9216m -Dlog.file=/data/2/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000009/taskmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnTaskManagerRunner --configDir . 1> /data/2/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000009/taskmanager.out 2> /data/2/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000009/taskmanager.err --streamingMode streaming
|- 44914 44905 44905 44905 (java) 345746 7991 17012555776 3165572 /usr/java/default/bin/java -Xms9216m -Xmx9216m -XX:MaxDirectMemorySize=9216m -Dlog.file=/data/2/hadoop/yarn/log/application_1444907929501_7416/container_e05_1444907929501_7416_01_000009/taskmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnTaskManagerRunner --configDir . --streamingMode streaming
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
14:18:29,988 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000009 was a running container.
Total failed containers 1.
14:18:29,988 INFO org.apache.flink.yarn.YarnJobManager - The user requested 11 containers, 10 running. 1 containers missing
14:18:29,989 INFO org.apache.flink.yarn.YarnJobManager - There are 1 containers missing. 0 are already requested. Requesting 1
additional container(s) from YARN. Reallocation of failed containers is enabled=true ('yarn.reallocate-failed')
14:18:29,990 INFO org.apache.flink.yarn.YarnJobManager - Requested additional container from YARN. Pending requests 1.
14:18:30,503 INFO org.apache.flink.yarn.YarnJobManager - The user requested 11 containers, 10 running. 1 containers missing
14:18:31,028 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn10.bpa.bouyguestelecom.fr:45454
14:18:31,028 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn05.bpa.bouyguestelecom.fr:45454
14:18:31,028 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn02.bpa.bouyguestelecom.fr:45454
14:18:31,028 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn12.bpa.bouyguestelecom.fr:45454
14:18:31,028 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn04.bpa.bouyguestelecom.fr:45454
14:18:31,028 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000013
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000014
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000015
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000016
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000017
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000018
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000019
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000020
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000021
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000022
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - The user requested 11 containers, 10 running. 1 containers missing
14:18:31,029 INFO org.apache.flink.yarn.YarnJobManager - 10 containers already allocated by YARN. Starting...
14:18:31,030 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : h1r2dn03.bpa.bouyguestelecom.fr:45454
14:18:31,041 INFO org.apache.flink.yarn.YarnJobManager - Launching container (container_e05_1444907929501_7416_01_000013 on host
h1r2dn03.bpa.bouyguestelecom.fr).
14:18:31,042 INFO org.apache.flink.yarn.YarnJobManager - Flink has 9 allocated containers which are not needed right now. Returning
them
14:18:33,030 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (20/50) (7f278c64beacbad498199a4e7cb1228f)
switched from RUNNING to FAILED
14:18:33,032 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Scan /PHOCUS2/DEV/UBV/IN (1/1) (9aa2539432a19f5b401b92ebcc32ab4e)
switched from RUNNING to CANCELING
14:18:33,032 INFO org.apache.flink.yarn.YarnJobManager - Status of job 0348520c75b5c18c9833d8eb4732c009 (KUBERA-GEO-SOURCE2BRUT)
changed to FAILING.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'bt1shli6/172.21.125.31:33186'.
This might indicate that the remote task manager was lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:744)
14:18:33,032 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
14:18:33,032 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (1/50) (764f26e2ae4f5d68e7881928dee8eea8)
switched from RUNNING to CANCELING
14:18:33,033 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (2/50) (f9ca3935b76ef8376a55c31b57e38388)
switched from RUNNING to CANCELING
14:18:33,033 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (3/50) (c586e6e833e331aaa48a68d57f9252bf)
switched from RUNNING to CANCELING
14:18:33,033 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (4/50) (5ed0f29817b241e9eb04a8876fc76737)
switched from RUNNING to CANCELING
14:18:33,033 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (5/50) (b572b973c66f3052c431fe6c10e5f1ff)
switched from RUNNING to CANCELING
14:18:33,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (6/50) (48d3e7ba592b47add633f6e65bf4fba7)
switched from RUNNING to CANCELING
14:18:33,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (7/50) (66b0717c028e19a4df23dee6341b56cd)
switched from RUNNING to CANCELING
14:18:33,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (8/50) (193fd3a321a5c5711d369691544fa7f7)
switched from RUNNING to CANCELING
14:18:33,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (9/50) (91a0e1349ee875d8f4786a204f6096a3)
switched from RUNNING to CANCELING
14:18:33,034 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (10/50) (1ddea0d9b8fb4a58e8f3150756267b9d)
switched from RUNNING to CANCELING
14:18:33,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (11/50) (bc26fe9aaf621f9d9c8e5358d2f1208f)
switched from RUNNING to CANCELING
14:18:33,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (12/50) (d71455ecb37eeb387a1da56a2f5964de)
switched from RUNNING to CANCELING
14:18:33,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (13/50) (a57717ee1b7e9e91091429b4a3c7dbb3)
switched from RUNNING to CANCELING
14:18:33,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (14/50) (8b1d11ed1a3cc51ba981b1006b7eb757)
switched from RUNNING to CANCELING
14:18:33,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (15/50) (f3e8182fca0651ca2c8a70bd660695c3)
switched from RUNNING to CANCELING
14:18:33,035 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (16/50) (bf5e18f069b74fc284bcf7b324c30277)
switched from RUNNING to CANCELING
14:18:33,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (17/50) (fcf8dba87b366a50a9c048dcca6cc18a)
switched from RUNNING to CANCELING
14:18:33,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (18/50) (811bf08e6b0c801dee0e4092777d3802)
switched from RUNNING to CANCELING
14:18:33,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (19/50) (93fb94616a05b5a0fa0ca7e553778565)
switched from RUNNING to CANCELING
14:18:33,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (21/50) (28193eaa9448382b8f462189400519c6)
switched from RUNNING to CANCELING
14:18:33,036 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (22/50) (d6548a95eb0d96c130c42bbe3cfa2c0b)
switched from RUNNING to CANCELING
14:18:33,037 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (23/50) (1e9f2598f8bb97d494bb3fe16a9577b2)
switched from RUNNING to CANCELING
14:18:33,037 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (24/50) (1aca8044e5026f741089945b02987435)
switched from RUNNING to CANCELING
14:18:33,037 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (25/50) (42c877823146567658ad6be8f5825744)
switched from RUNNING to CANCELING
14:18:33,037 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (26/50) (9883561c52a4c7ae4decdcc3ce9a371f)
switched from RUNNING to CANCELING
14:18:33,037 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (27/50) (5875ea808ef3dc5c561066018dc87e28)
switched from RUNNING to CANCELING
14:18:33,038 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (28/50) (fd88b4c83cf67be6a6e6f73b7a5b83b3)
switched from RUNNING to CANCELING
14:18:33,038 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (29/50) (5b8570f5447752c74c904e6ee73c5e20)
switched from RUNNING to CANCELING
14:18:33,038 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (30/50) (a0c2817b7fa7c4c74c74d0273bd6422c)
switched from RUNNING to CANCELING
14:18:33,038 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (31/50) (b9bbfe013b3f4e6cd3203dd8d6440383)
switched from RUNNING to CANCELING
14:18:33,038 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (32/50) (60686a6909ab09b03ea6ee8fd87c8a99)
switched from RUNNING to CANCELING
14:18:33,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (33/50) (e1735a0ce66449f357936dcc147053b9)
switched from RUNNING to CANCELING
14:18:33,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (34/50) (a368834aec80833291d4a4a5cf974b5d)
switched from RUNNING to CANCELING
14:18:33,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (35/50) (d37e0742c5fbf0798c02f1384e8f9bc0)
switched from RUNNING to CANCELING
14:18:33,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (36/50) (6154f1df4b63ad96a7d4d03b2a633fb1)
switched from RUNNING to CANCELING
14:18:33,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (37/50) (02f33613549b380fd901fcef94136eaa)
switched from RUNNING to CANCELING
14:18:33,039 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (38/50) (34aaaa6576c48b55225e2b446d663621)
switched from RUNNING to CANCELING
14:18:33,040 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (39/50) (b1d4c9aec8114c3889e2c02da0e3d41d)
switched from RUNNING to CANCELING
14:18:33,040 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (40/50) (63833c22e3b6b79d15f12c5b5bb6a70b)
switched from RUNNING to CANCELING
14:18:33,040 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (41/50) (25c16c9065a5486f0e8dcd0cc3f9c6d1)
switched from RUNNING to CANCELING
14:18:33,040 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (42/50) (7b03ade33a4048d3ceed773fda2f9b23)
switched from RUNNING to CANCELING
14:18:33,040 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (43/50) (8424b4ef2960c29fdcae110e565b0316)
switched from RUNNING to CANCELING
14:18:33,041 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (44/50) (921a6a7068b2830c4cb4dd5445cae1f2)
switched from RUNNING to CANCELING
14:18:33,041 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (45/50) (e197594d273125a930335d93cf51254b)
switched from RUNNING to CANCELING
14:18:33,041 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (46/50) (885e3d1c85765a9270fdee1bfd9b9852)
switched from RUNNING to CANCELING
14:18:33,041 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (47/50) (146174eb5be9dbe2296d806783f61868)
switched from RUNNING to CANCELING
14:18:33,042 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (48/50) (9e205e3d4bf6c665c3b817c3b622fef4)
switched from RUNNING to CANCELING
14:18:33,042 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (49/50) (2650dc529b7149928d41ec35096fd475)
switched from RUNNING to CANCELING
14:18:33,042 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (50/50) (93038fac2c92e2b79623579968f9a0da)
switched from RUNNING to CANCELING
14:18:33,063 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@172.21.125.31:42048].
Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /172.21.125.31:42048
14:18:33,067 INFO org.apache.flink.yarn.YarnJobManager - Task manager akka.tcp://flink@172.21.125.31:42048/user/taskmanager terminated.
14:18:33,068 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (4/50) (5ed0f29817b241e9eb04a8876fc76737)
switched from CANCELING to FAILED
14:18:33,068 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (1/50) (764f26e2ae4f5d68e7881928dee8eea8)
switched from CANCELING to FAILED
14:18:33,069 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Scan /PHOCUS2/DEV/UBV/IN (1/1) (9aa2539432a19f5b401b92ebcc32ab4e)
switched from CANCELING to FAILED
14:18:33,069 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (2/50) (f9ca3935b76ef8376a55c31b57e38388)
switched from CANCELING to FAILED
14:18:33,070 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (5/50) (b572b973c66f3052c431fe6c10e5f1ff)
switched from CANCELING to FAILED
14:18:33,070 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (3/50) (c586e6e833e331aaa48a68d57f9252bf)
switched from CANCELING to FAILED
14:18:33,071 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager akka.tcp://flink@172.21.125.31:42048/user/taskmanager.
Number of registered task managers 10. Number of available slots 50.
14:18:33,332 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (8/50) (193fd3a321a5c5711d369691544fa7f7)
switched from CANCELING to CANCELED
14:18:33,403 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (11/50) (bc26fe9aaf621f9d9c8e5358d2f1208f)
switched from CANCELING to CANCELED
14:18:34,050 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (24/50) (1aca8044e5026f741089945b02987435)
switched from CANCELING to CANCELED
14:18:34,522 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (35/50) (d37e0742c5fbf0798c02f1384e8f9bc0)
switched from CANCELING to CANCELED
14:18:34,683 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (48/50) (9e205e3d4bf6c665c3b817c3b622fef4)
switched from CANCELING to CANCELED
14:18:34,714 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (16/50) (bf5e18f069b74fc284bcf7b324c30277)
switched from CANCELING to CANCELED
14:18:34,919 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at h1r2dn03 (akka.tcp://flink@172.21.125.25:44296/user/taskmanager)
as d65581cce8a29775ba3df10d02756d11. Current number of registered hosts is 11. Current number of alive task slots is 55.
14:18:35,410 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (31/50) (b9bbfe013b3f4e6cd3203dd8d6440383)
switched from CANCELING to CANCELED
14:18:35,513 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (30/50) (a0c2817b7fa7c4c74c74d0273bd6422c)
switched from CANCELING to CANCELED
14:18:35,739 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (7/50) (66b0717c028e19a4df23dee6341b56cd)
switched from CANCELING to CANCELED
14:18:35,896 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (41/50) (25c16c9065a5486f0e8dcd0cc3f9c6d1)
switched from CANCELING to CANCELED
14:18:35,948 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (13/50) (a57717ee1b7e9e91091429b4a3c7dbb3)
switched from CANCELING to CANCELED
14:18:36,073 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r1dn09.bpa.bouyguestelecom.fr:45454
14:18:36,073 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Received new token for : h1r2dn01.bpa.bouyguestelecom.fr:45454
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000023
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Got new container for allocation: container_e05_1444907929501_7416_01_000024
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000017 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000014 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000018 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000022 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000019 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000020 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000016 is completed with
diagnostics: Container released by application
14:18:36,074 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000015 is completed with
diagnostics: Container released by application
14:18:36,075 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000021 is completed with
diagnostics: Container released by application
14:18:36,075 INFO org.apache.flink.yarn.YarnJobManager - Flink has 2 allocated containers which are not needed right now. Returning
them
14:18:36,197 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (6/50) (48d3e7ba592b47add633f6e65bf4fba7)
switched from CANCELING to CANCELED
14:18:36,331 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (38/50) (34aaaa6576c48b55225e2b446d663621)
switched from CANCELING to CANCELED
14:18:36,540 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (34/50) (a368834aec80833291d4a4a5cf974b5d)
switched from CANCELING to CANCELED
14:18:36,645 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (46/50) (885e3d1c85765a9270fdee1bfd9b9852)
switched from CANCELING to CANCELED
14:18:37,084 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (22/50) (d6548a95eb0d96c130c42bbe3cfa2c0b)
switched from CANCELING to CANCELED
14:18:37,398 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (39/50) (b1d4c9aec8114c3889e2c02da0e3d41d)
switched from CANCELING to CANCELED
14:18:37,834 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (28/50) (fd88b4c83cf67be6a6e6f73b7a5b83b3)
switched from CANCELING to CANCELED
14:18:38,433 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (14/50) (8b1d11ed1a3cc51ba981b1006b7eb757)
switched from CANCELING to CANCELED
14:18:38,542 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (42/50) (7b03ade33a4048d3ceed773fda2f9b23)
switched from CANCELING to CANCELED
14:18:39,011 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (12/50) (d71455ecb37eeb387a1da56a2f5964de)
switched from CANCELING to CANCELED
14:18:39,088 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (44/50) (921a6a7068b2830c4cb4dd5445cae1f2)
switched from CANCELING to CANCELED
14:18:39,194 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (10/50) (1ddea0d9b8fb4a58e8f3150756267b9d)
switched from CANCELING to CANCELED
14:18:39,625 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (21/50) (28193eaa9448382b8f462189400519c6)
switched from CANCELING to CANCELED
14:18:40,299 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (29/50) (5b8570f5447752c74c904e6ee73c5e20)
switched from CANCELING to CANCELED
14:18:40,572 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (18/50) (811bf08e6b0c801dee0e4092777d3802)
switched from CANCELING to CANCELED
14:18:40,985 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (23/50) (1e9f2598f8bb97d494bb3fe16a9577b2)
switched from CANCELING to CANCELED
14:18:41,094 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000024 is completed with
diagnostics: Container released by application
14:18:41,094 INFO org.apache.flink.yarn.YarnJobManager - Container container_e05_1444907929501_7416_01_000023 is completed with
diagnostics: Container released by application
14:18:41,162 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (15/50) (f3e8182fca0651ca2c8a70bd660695c3)
switched from CANCELING to CANCELED
14:18:41,625 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (40/50) (63833c22e3b6b79d15f12c5b5bb6a70b)
switched from CANCELING to CANCELED
14:18:41,710 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (47/50) (146174eb5be9dbe2296d806783f61868)
switched from CANCELING to CANCELED
14:18:41,828 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (37/50) (02f33613549b380fd901fcef94136eaa)
switched from CANCELING to CANCELED
14:18:41,932 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (27/50) (5875ea808ef3dc5c561066018dc87e28)
switched from CANCELING to CANCELED
14:18:41,947 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (19/50) (93fb94616a05b5a0fa0ca7e553778565)
switched from CANCELING to CANCELED
14:18:42,806 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (49/50) (2650dc529b7149928d41ec35096fd475)
switched from CANCELING to CANCELED
14:18:43,286 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (50/50) (93038fac2c92e2b79623579968f9a0da)
switched from CANCELING to CANCELED
14:18:43,872 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (33/50) (e1735a0ce66449f357936dcc147053b9)
switched from CANCELING to CANCELED
14:18:44,474 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (45/50) (e197594d273125a930335d93cf51254b)
switched from CANCELING to CANCELED
14:18:44,594 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (17/50) (fcf8dba87b366a50a9c048dcca6cc18a)
switched from CANCELING to CANCELED
14:18:44,647 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (36/50) (6154f1df4b63ad96a7d4d03b2a633fb1)
switched from CANCELING to CANCELED
14:18:44,926 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (25/50) (42c877823146567658ad6be8f5825744)
switched from CANCELING to CANCELED
14:18:46,892 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (32/50) (60686a6909ab09b03ea6ee8fd87c8a99)
switched from CANCELING to CANCELED
14:18:47,489 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (9/50) (91a0e1349ee875d8f4786a204f6096a3)
switched from CANCELING to CANCELED
14:18:47,547 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (26/50) (9883561c52a4c7ae4decdcc3ce9a371f)
switched from CANCELING to CANCELED
14:18:52,491 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Flat Map -> Filter -> Flat Map -> Map -> Sink: Unnamed (43/50) (8424b4ef2960c29fdcae110e565b0316)
switched from CANCELING to CANCELED
14:18:52,493 INFO org.apache.flink.yarn.YarnJobManager - Status of job 0348520c75b5c18c9833d8eb4732c009 (KUBERA-GEO-SOURCE2BRUT)
changed to FAILED.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'bt1shli6/172.21.125.31:33186'.
This might indicate that the remote task manager was lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:119)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:306)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828)
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:744)
14:18:53,017 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@127.0.0.1:38583] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].
14:20:13,057 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@172.21.125.31:42048].
Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /172.21.125.31:42048
14:21:53,076 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@172.21.125.31:42048].
Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /172.21.125.31:42048
Free forum by Nabble | Edit this page |