"Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

"Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

LINZ, Arnaud

Hello,

 

I’ve moved my version from 0.9.0 and tried both 0.9-SNAPSHOT & 0.10-SNAPSHOT to continue my batch execution on my secured cluster thanks to [FLINK-2555].

My application works nicely in local mode and also in yarn mode using a job container started with yarn-session.sh, but it fails in –m yarn-cluster mode

 

Yarn logs indicate that  “Flink YARN Client requested shutdown” but I did nothing like that (or not intentionally). The nodes are not even starting and the exec() does not return any JobExecutionResult.

 

My command line was :

flink run -m yarn-cluster -yd -yn 2 -ytm 1500 -yqu default -ys 4 --class <myMainClass> <myJar> <some options>

 

Any idea what I’ve done wrong?

 

Greetings,

Arnaud

 

PS - Yarn log extract :

(…)

09:56:29,111 INFO  org.apache.flink.yarn.YarnTaskManager                         - Successful registration at JobManager (akka.tcp://flink@172.19.115.51:54806/user/jobmanager), starting network stack and library cache.

09:56:29,817 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 73 ms).

09:56:29,889 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 55 ms). Listening on SocketAddress /172.19.115.52:41920.

09:56:29,890 INFO  org.apache.flink.yarn.YarnTaskManager                         - Determined BLOB server address to be /172.19.115.51:38505. Starting BLOB cache.

09:56:29,893 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-7150f7d7-f7a3-4c4c-9cda-3877da5aacd6

09:56:52,367 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,375 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,383 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,387 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,394 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,402 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,425 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,429 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,454 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping YARN TaskManager with final application status FAILED and diagnostics: Flink YARN Client requested shutdown

09:56:52,480 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping TaskManager akka://flink/user/taskmanager#2116513584.

09:56:52,483 INFO  org.apache.flink.yarn.YarnTaskManager                         - Cancelling all computations and discarding all cached data.

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,518 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,519 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,528 INFO  org.apache.flink.yarn.YarnTaskManager                         - Disassociating from JobManager

09:56:53,242 INFO  org.apache.flink.runtime.blob.BlobCache                       - Downloading fa68a8a2d6075c8e3692e1f1ac34dc2dba3d201e from /172.19.115.51:38505

09:56:53,257 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful shutdown (took 10 ms).

09:56:53,263 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful shutdown (took 4 ms).

 




L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
Reply | Threaded
Open this post in threaded view
|

Re: "Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

rmetzger0
Is the log from 0.9-SNAPSHOT or 0.10-SNAPSHOT?

Can you send me (if you want privately as well) the full log of the yarn application:

yarn logs -applicationId <appId>.

We need to find out why the TaskManagers are shutting down. That is most likely logged in the TaskManager logs.


On Fri, Aug 28, 2015 at 10:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

 

I’ve moved my version from 0.9.0 and tried both 0.9-SNAPSHOT & 0.10-SNAPSHOT to continue my batch execution on my secured cluster thanks to [FLINK-2555].

My application works nicely in local mode and also in yarn mode using a job container started with yarn-session.sh, but it fails in –m yarn-cluster mode

 

Yarn logs indicate that  “Flink YARN Client requested shutdown” but I did nothing like that (or not intentionally). The nodes are not even starting and the exec() does not return any JobExecutionResult.

 

My command line was :

flink run -m yarn-cluster -yd -yn 2 -ytm 1500 -yqu default -ys 4 --class <myMainClass> <myJar> <some options>

 

Any idea what I’ve done wrong?

 

Greetings,

Arnaud

 

PS - Yarn log extract :

(…)

09:56:29,111 INFO  org.apache.flink.yarn.YarnTaskManager                         - Successful registration at JobManager (akka.tcp://flink@172.19.115.51:54806/user/jobmanager), starting network stack and library cache.

09:56:29,817 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 73 ms).

09:56:29,889 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 55 ms). Listening on SocketAddress /172.19.115.52:41920.

09:56:29,890 INFO  org.apache.flink.yarn.YarnTaskManager                         - Determined BLOB server address to be /172.19.115.51:38505. Starting BLOB cache.

09:56:29,893 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-7150f7d7-f7a3-4c4c-9cda-3877da5aacd6

09:56:52,367 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,375 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,383 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,387 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,394 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,402 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,425 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,429 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,454 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping YARN TaskManager with final application status FAILED and diagnostics: Flink YARN Client requested shutdown

09:56:52,480 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping TaskManager akka://flink/user/taskmanager#2116513584.

09:56:52,483 INFO  org.apache.flink.yarn.YarnTaskManager                         - Cancelling all computations and discarding all cached data.

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,518 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,519 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,528 INFO  org.apache.flink.yarn.YarnTaskManager                         - Disassociating from JobManager

09:56:53,242 INFO  org.apache.flink.runtime.blob.BlobCache                       - Downloading fa68a8a2d6075c8e3692e1f1ac34dc2dba3d201e from /172.19.115.51:38505

09:56:53,257 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful shutdown (took 10 ms).

09:56:53,263 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful shutdown (took 4 ms).

 




L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

Reply | Threaded
Open this post in threaded view
|

RE: "Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

LINZ, Arnaud

Hi Robert,

 

As seen together, my mistake was to launch the job in detached mode (-yd) when my main function was not waiting after execution and was immediately ending. Sorry for my misunderstanding of this option.

 

Best regards,

Arnaud

 

De : Robert Metzger [mailto:[hidden email]]
Envoyé : vendredi 28 août 2015 11:03
À : [hidden email]
Objet : Re: "Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

 

Is the log from 0.9-SNAPSHOT or 0.10-SNAPSHOT?

 

Can you send me (if you want privately as well) the full log of the yarn application:

 

yarn logs -applicationId <appId>.

 

We need to find out why the TaskManagers are shutting down. That is most likely logged in the TaskManager logs.

 

 

On Fri, Aug 28, 2015 at 10:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

 

I’ve moved my version from 0.9.0 and tried both 0.9-SNAPSHOT & 0.10-SNAPSHOT to continue my batch execution on my secured cluster thanks to [FLINK-2555].

My application works nicely in local mode and also in yarn mode using a job container started with yarn-session.sh, but it fails in –m yarn-cluster mode

 

Yarn logs indicate that  “Flink YARN Client requested shutdown” but I did nothing like that (or not intentionally). The nodes are not even starting and the exec() does not return any JobExecutionResult.

 

My command line was :

flink run -m yarn-cluster -yd -yn 2 -ytm 1500 -yqu default -ys 4 --class <myMainClass> <myJar> <some options>

 

Any idea what I’ve done wrong?

 

Greetings,

Arnaud

 

PS - Yarn log extract :

(…)

09:56:29,111 INFO  org.apache.flink.yarn.YarnTaskManager                         - Successful registration at JobManager (akka.tcp://flink@172.19.115.51:54806/user/jobmanager), starting network stack and library cache.

09:56:29,817 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 73 ms).

09:56:29,889 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 55 ms). Listening on SocketAddress /172.19.115.52:41920.

09:56:29,890 INFO  org.apache.flink.yarn.YarnTaskManager                         - Determined BLOB server address to be /172.19.115.51:38505. Starting BLOB cache.

09:56:29,893 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-7150f7d7-f7a3-4c4c-9cda-3877da5aacd6

09:56:52,367 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,375 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,383 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,387 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,394 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,402 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,425 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,429 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,454 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping YARN TaskManager with final application status FAILED and diagnostics: Flink YARN Client requested shutdown

09:56:52,480 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping TaskManager akka://flink/user/taskmanager#2116513584.

09:56:52,483 INFO  org.apache.flink.yarn.YarnTaskManager                         - Cancelling all computations and discarding all cached data.

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,518 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,519 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,528 INFO  org.apache.flink.yarn.YarnTaskManager                         - Disassociating from JobManager

09:56:53,242 INFO  org.apache.flink.runtime.blob.BlobCache                       - Downloading fa68a8a2d6075c8e3692e1f1ac34dc2dba3d201e from /172.19.115.51:38505

09:56:53,257 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful shutdown (took 10 ms).

09:56:53,263 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful shutdown (took 4 ms).

 

 



L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.

 

Reply | Threaded
Open this post in threaded view
|

Re: "Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

rmetzger0
Hi,
no problem. The behavior is not documented and I also needed some time to figure this out ;)

I'm already preparing a pull request to add a note into the documentation.

On Fri, Aug 28, 2015 at 4:41 PM, LINZ, Arnaud <[hidden email]> wrote:

Hi Robert,

 

As seen together, my mistake was to launch the job in detached mode (-yd) when my main function was not waiting after execution and was immediately ending. Sorry for my misunderstanding of this option.

 

Best regards,

Arnaud

 

De : Robert Metzger [mailto:[hidden email]]
Envoyé : vendredi 28 août 2015 11:03
À : [hidden email]
Objet : Re: "Flink YARN Client requested shutdown" in flink -m yarn-cluster mode?

 

Is the log from 0.9-SNAPSHOT or 0.10-SNAPSHOT?

 

Can you send me (if you want privately as well) the full log of the yarn application:

 

yarn logs -applicationId <appId>.

 

We need to find out why the TaskManagers are shutting down. That is most likely logged in the TaskManager logs.

 

 

On Fri, Aug 28, 2015 at 10:57 AM, LINZ, Arnaud <[hidden email]> wrote:

Hello,

 

I’ve moved my version from 0.9.0 and tried both 0.9-SNAPSHOT & 0.10-SNAPSHOT to continue my batch execution on my secured cluster thanks to [FLINK-2555].

My application works nicely in local mode and also in yarn mode using a job container started with yarn-session.sh, but it fails in –m yarn-cluster mode

 

Yarn logs indicate that  “Flink YARN Client requested shutdown” but I did nothing like that (or not intentionally). The nodes are not even starting and the exec() does not return any JobExecutionResult.

 

My command line was :

flink run -m yarn-cluster -yd -yn 2 -ytm 1500 -yqu default -ys 4 --class <myMainClass> <myJar> <some options>

 

Any idea what I’ve done wrong?

 

Greetings,

Arnaud

 

PS - Yarn log extract :

(…)

09:56:29,111 INFO  org.apache.flink.yarn.YarnTaskManager                         - Successful registration at JobManager (akka.tcp://flink@172.19.115.51:54806/user/jobmanager), starting network stack and library cache.

09:56:29,817 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 73 ms).

09:56:29,889 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 55 ms). Listening on SocketAddress /172.19.115.52:41920.

09:56:29,890 INFO  org.apache.flink.yarn.YarnTaskManager                         - Determined BLOB server address to be /172.19.115.51:38505. Starting BLOB cache.

09:56:29,893 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /tmp/blobStore-7150f7d7-f7a3-4c4c-9cda-3877da5aacd6

09:56:52,367 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,375 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,383 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,387 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,394 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,402 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,425 INFO  org.apache.flink.yarn.YarnTaskManager                         - Received task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,429 INFO  org.apache.flink.runtime.taskmanager.Task                     - Loading JAR files for task CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,454 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping YARN TaskManager with final application status FAILED and diagnostics: Flink YARN Client requested shutdown

09:56:52,480 INFO  org.apache.flink.yarn.YarnTaskManager                         - Stopping TaskManager akka://flink/user/taskmanager#2116513584.

09:56:52,483 INFO  org.apache.flink.yarn.YarnTaskManager                         - Cancelling all computations and discarding all cached data.

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3)

09:56:52,486 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (3/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3)

09:56:52,511 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3)

09:56:52,515 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 2) (2/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,518 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to fail task externally CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3)

09:56:52,519 INFO  org.apache.flink.runtime.taskmanager.Task                     - CHAIN DataSource (at createInput(ExecutionEnvironment.java:502) (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat)) -> FlatMap (FlatMap at readTable(HiveDAO.java:107)) -> Map (Key Extractor 1) (1/3) switched to FAILED with exception.

java.lang.Exception: TaskManager is shutting down.

        at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:195)

        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)

        at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:114)

        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)

        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)

        at akka.actor.ActorCell.terminate(ActorCell.scala:369)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)

        at akka.dispatch.Mailbox.run(Mailbox.scala:220)

        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

09:56:52,528 INFO  org.apache.flink.yarn.YarnTaskManager                         - Disassociating from JobManager

09:56:53,242 INFO  org.apache.flink.runtime.blob.BlobCache                       - Downloading fa68a8a2d6075c8e3692e1f1ac34dc2dba3d201e from /172.19.115.51:38505

09:56:53,257 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful shutdown (took 10 ms).

09:56:53,263 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful shutdown (took 4 ms).

 

 



L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.