Can't run flink on yarn on version 1.2.0

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Can't run flink on yarn on version 1.2.0

Howard,Li(vip.com)

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)

     ... 6 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)

     at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)

     ... 21 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

     at scala.concurrent.Await$.result(package.scala:107)

     at scala.concurrent.Await.result(package.scala)

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)

     ... 23 more

2017-02-17 15:53:20,084 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:53:20,085 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:53:20,088 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:53:20,089 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0017 failed 1 times due to AM Container for appattempt_1487247313588_0017_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18733,containerID=container_1487247313588_0017_01_000001] is running beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0017_01_000001 :

     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

     |- 18740 18733 18733 18733 (java) 955 64 2298933248 67430 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

     |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:53:20,102 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,106 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:20,107 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,108 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:53:20,112 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

Listening for transport dt_socket at address: 5006

2017-02-17 15:53:20,624 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:21,124 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:21,645 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:22,145 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,664 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:24,185 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:25,204 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

 

The main error is : org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gatewayMay be It have some relationship with https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will always take place in akka address, but not hostnames. But I find hostname in akka address in leaderRetrievalService.

 

This problem won’t appear in 1.1.4.

 

Thank you all.

 

Howard

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.
Reply | Threaded
Open this post in threaded view
|

Re: Can't run flink on yarn on version 1.2.0

elmosca
Hi Howard,

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see:

The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

Cheers,

Bruno

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <[hidden email]> wrote:

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)

     ... 6 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)

     at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)

     ... 21 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

     at scala.concurrent.Await$.result(package.scala:107)

     at scala.concurrent.Await.result(package.scala)

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)

     ... 23 more

2017-02-17 15:53:20,084 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:53:20,085 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:53:20,088 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:53:20,089 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0017 failed 1 times due to AM Container for appattempt_1487247313588_0017_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18733,containerID=container_1487247313588_0017_01_000001] is running beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0017_01_000001 :

     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

     |- 18740 18733 18733 18733 (java) 955 64 2298933248 67430 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

     |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:53:20,102 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,106 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:20,107 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,108 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:53:20,112 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

Listening for transport dt_socket at address: 5006

2017-02-17 15:53:20,624 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:21,124 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:21,645 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:22,145 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,664 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:24,185 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:25,204 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

 

The main error is : org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gatewayMay be It have some relationship with https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will always take place in akka address, but not hostnames. But I find hostname in akka address in leaderRetrievalService.

 

This problem won’t appear in 1.1.4.

 

Thank you all.

 

Howard

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.
Reply | Threaded
Open this post in threaded view
|

Re: Can't run flink on yarn on version 1.2.0

Howard,Li(vip.com)
In reply to this post by Howard,Li(vip.com)

Sorry for the confusion I made. I do copy the wrong log, but we do meet this problem on 1.2.0.

for version 1.1.4 however, we meet this in one cluster but not in another. We are still trying to figure out what happened.

 

The following is the log for 1.2.0 version:

 

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,803 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager count = 2

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    JobManager memory = 1024

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager memory = 1024

2017-02-17 15:51:37,827 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:51:38,672 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:51:38,685 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/WordCount.jar

2017-02-17 15:51:38,992 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/log4j.properties

2017-02-17 15:51:39,058 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/logback.xml

2017-02-17 15:51:39,085 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/lib

2017-02-17 15:51:39,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-dist_2.11-1.2.0.jar

2017-02-17 15:51:40,493 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-conf.yaml

2017-02-17 15:51:40,547 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:51:40,587 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:51:45,879 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0016

Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0016/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:51:46,704 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)

         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)

         at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)

         at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:412)

         at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)

         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

         at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)

         at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

         at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:498)

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)

         ... 13 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)

         at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

         ... 28 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.result(package.scala:190)

         at scala.concurrent.Await.result(package.scala)

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)

         ... 30 more

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:52:21,151 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:52:21,152 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:21,160 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://[hidden email]:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,163 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:21,164 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://[hidden email]:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://[hidden email]:55926/user/jobmanager.

2017-02-17 15:52:21,684 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://[hidden email]:55926/user/jobmanager.

2017-02-17 15:52:22,174 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:22,704 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://[hidden email]:55926/user/jobmanager.

2017-02-17 15:52:23,194 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,214 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,725 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://[hidden email]:55926/user/jobmanager.

2017-02-17 15:52:25,234 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:26,254 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:27,274 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,294 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,744 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://[hidden email]:55926/user/jobmanager.

2017-02-17 15:52:29,314 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:30,334 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:31,155 WARN  org.apache.flink.yarn.YarnClusterClient                       - Error while stopping YARN cluster.

java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.ready(package.scala:169)

         at scala.concurrent.Await.ready(package.scala)

         at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)

         at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)

         at org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

2017-02-17 15:52:31,156 INFO  org.apache.flink.yarn.YarnClusterClient                       - Deleting files in hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016

2017-02-17 15:52:31,354 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:32,163 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1487247313588_0016 finished with state FAILED and final state FAILED at 1487317906227

2017-02-17 15:52:32,163 WARN  org.apache.flink.yarn.YarnClusterClient                       - Application failed. Diagnostics Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:32,164 WARN  org.apache.flink.yarn.YarnClusterClient                       - If log aggregation is activated in the Hadoop cluster, we recommend to retrieve the full application log using this command:

         yarn logs -applicationId application_1487247313588_0016

(It sometimes takes a few seconds until the logs are aggregated)

2017-02-17 15:52:32,164 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

 

 

发件人: Bruno Aranda [mailto:[hidden email]]
发送时间: 2017217 17:02
收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see:

 

The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

 

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

 

Cheers,

 

Bruno

 

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <[hidden email]> wrote:

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)

     ... 6 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)

     at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)

     ... 21 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

     at scala.concurrent.Await$.result(package.scala:107)

     at scala.concurrent.Await.result(package.scala)

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)

     ... 23 more

2017-02-17 15:53:20,084 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:53:20,085 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:53:20,088 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:53:20,089 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0017 failed 1 times due to AM Container for appattempt_1487247313588_0017_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18733,containerID=container_1487247313588_0017_01_000001] is running beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0017_01_000001 :

     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

     |- 18740 18733 18733 18733 (java) 955 64 2298933248 67430 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

     |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:53:20,102 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,106 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:20,107 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,108 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:53:20,112 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

Listening for transport dt_socket at address: 5006

2017-02-17 15:53:20,624 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:21,124 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:21,645 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:22,145 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,664 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:24,185 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:25,204 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

 

The main error is : org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gatewayMay be It have some relationship with https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will always take place in akka address, but not hostnames. But I find hostname in akka address in leaderRetrievalService.

 

This problem won’t appear in 1.1.4.

 

Thank you all.

 

Howard

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.
Reply | Threaded
Open this post in threaded view
|

Re: Can't run flink on yarn on version 1.2.0

Till Rohrmann
Hi Howard,

could you check whether the JobManager's actor system was bound to "vip-rc-vsubu.vclound.com:55926"? You should see that in the job manager logs. Furthermore, have you checked that you Yarn cluster nodes are actually reachable from the node where you start the Flink application? If so, the logs of the cli client as well as the JobManager logs (both on debug level) would be tremendously helpful.

Cheers,
Till

On Fri, Feb 17, 2017 at 10:41 AM, Howard,Li(vip.com) <[hidden email]> wrote:

Sorry for the confusion I made. I do copy the wrong log, but we do meet this problem on 1.2.0.

for version 1.1.4 however, we meet this in one cluster but not in another. We are still trying to figure out what happened.

 

The following is the log for 1.2.0 version:

 

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,803 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager count = 2

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    JobManager memory = 1024

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager memory = 1024

2017-02-17 15:51:37,827 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:51:38,672 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:51:38,685 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/WordCount.jar

2017-02-17 15:51:38,992 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/log4j.properties

2017-02-17 15:51:39,058 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/logback.xml

2017-02-17 15:51:39,085 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/lib

2017-02-17 15:51:39,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-dist_2.11-1.2.0.jar

2017-02-17 15:51:40,493 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-conf.yaml

2017-02-17 15:51:40,547 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:51:40,587 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:51:45,879 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0016

Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0016/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:51:46,704 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)

         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)

         at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)

         at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:412)

         at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)

         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

         at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)

         at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

         at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:498)

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)

         ... 13 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)

         at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

         ... 28 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.result(package.scala:190)

         at scala.concurrent.Await.result(package.scala)

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)

         ... 30 more

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:52:21,151 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:52:21,152 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" value="+12294116352" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:21,160 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,163 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:21,164 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:21,684 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:22,174 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:22,704 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:23,194 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,214 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,725 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:25,234 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:26,254 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:27,274 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,294 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,744 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:29,314 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:30,334 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:31,155 WARN  org.apache.flink.yarn.YarnClusterClient                       - Error while stopping YARN cluster.

java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.ready(package.scala:169)

         at scala.concurrent.Await.ready(package.scala)

         at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)

         at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)

         at org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

2017-02-17 15:52:31,156 INFO  org.apache.flink.yarn.YarnClusterClient                       - Deleting files in hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016

2017-02-17 15:52:31,354 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:32,163 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1487247313588_0016 finished with state FAILED and final state FAILED at 1487317906227

2017-02-17 15:52:32,163 WARN  org.apache.flink.yarn.YarnClusterClient                       - Application failed. Diagnostics Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" value="+12294116352" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:32,164 WARN  org.apache.flink.yarn.YarnClusterClient                       - If log aggregation is activated in the Hadoop cluster, we recommend to retrieve the full application log using this command:

         yarn logs -applicationId application_1487247313588_0016

(It sometimes takes a few seconds until the logs are aggregated)

2017-02-17 15:52:32,164 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

 

 

发件人: Bruno Aranda [mailto:[hidden email]]
发送时间: 2017217 17:02
收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see:

 

The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

 

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

 

Cheers,

 

Bruno

 

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <[hidden email]> wrote:

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)

     ... 6 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)

     at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)

     ... 21 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

     at scala.concurrent.Await$.result(package.scala:107)

     at scala.concurrent.Await.result(package.scala)

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)

     ... 23 more

2017-02-17 15:53:20,084 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:53:20,085 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:53:20,088 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:53:20,089 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0017 failed 1 times due to AM Container for appattempt_1487247313588_0017_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18733,containerID=container_1487247313588_0017_01_000001] is running beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0017_01_000001 :

     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

     |- 18740 18733 18733 18733 (java) 955 64 <a href="tel:(229)%20893-3248" value="+12298933248" target="_blank">2298933248 67430 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

     |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:53:20,102 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,106 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:20,107 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,108 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:53:20,112 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

Listening for transport dt_socket at address: 5006

2017-02-17 15:53:20,624 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:21,124 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:21,645 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:22,145 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,664 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:24,185 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:25,204 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

 

The main error is : org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gatewayMay be It have some relationship with https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will always take place in akka address, but not hostnames. But I find hostname in akka address in leaderRetrievalService.

 

This problem won’t appear in 1.1.4.

 

Thank you all.

 

Howard

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

Reply | Threaded
Open this post in threaded view
|

Re: Can't run flink on yarn on version 1.2.0

Howard,Li(vip.com)
In reply to this post by Howard,Li(vip.com)

Hi All:

         We finally find out the problem.

         The Flink on Yarn only works on JDK7, but not JDK8. If you use JDK8, you may meet the problem discussed before.

         For more information: OS: CentOS 6.6. JDK7 version: 1.7.0u75 JDK8 version: 1.8.0u111.

        

         This problem may have some relationship with akka.

 

发件人: Till Rohrmann [mailto:[hidden email]]
发送时间: 2017217 18:33
收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

could you check whether the JobManager's actor system was bound to "vip-rc-vsubu.vclound.com:55926"? You should see that in the job manager logs. Furthermore, have you checked that you Yarn cluster nodes are actually reachable from the node where you start the Flink application? If so, the logs of the cli client as well as the JobManager logs (both on debug level) would be tremendously helpful.

 

Cheers,

Till

 

On Fri, Feb 17, 2017 at 10:41 AM, Howard,Li(vip.com) <[hidden email]> wrote:

Sorry for the confusion I made. I do copy the wrong log, but we do meet this problem on 1.2.0.

for version 1.1.4 however, we meet this in one cluster but not in another. We are still trying to figure out what happened.

 

The following is the log for 1.2.0 version:

 

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,803 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager count = 2

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    JobManager memory = 1024

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager memory = 1024

2017-02-17 15:51:37,827 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:51:38,672 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:51:38,685 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/WordCount.jar

2017-02-17 15:51:38,992 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/log4j.properties

2017-02-17 15:51:39,058 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/logback.xml

2017-02-17 15:51:39,085 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/lib

2017-02-17 15:51:39,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-dist_2.11-1.2.0.jar

2017-02-17 15:51:40,493 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-conf.yaml

2017-02-17 15:51:40,547 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:51:40,587 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:51:45,879 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0016

Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0016/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:51:46,704 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)

         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)

         at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)

         at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:412)

         at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)

         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

         at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)

         at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

         at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:498)

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)

         ... 13 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)

         at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

         ... 28 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.result(package.scala:190)

         at scala.concurrent.Await.result(package.scala)

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)

         ... 30 more

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:52:21,151 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:52:21,152 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:21,160 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@...:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,163 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:21,164 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@...:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:21,684 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:22,174 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:22,704 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:23,194 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,214 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,725 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:25,234 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:26,254 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:27,274 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,294 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,744 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:29,314 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:30,334 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:31,155 WARN  org.apache.flink.yarn.YarnClusterClient                       - Error while stopping YARN cluster.

java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.ready(package.scala:169)

         at scala.concurrent.Await.ready(package.scala)

         at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)

         at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)

         at org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

2017-02-17 15:52:31,156 INFO  org.apache.flink.yarn.YarnClusterClient                       - Deleting files in hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016

2017-02-17 15:52:31,354 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:32,163 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1487247313588_0016 finished with state FAILED and final state FAILED at 1487317906227

2017-02-17 15:52:32,163 WARN  org.apache.flink.yarn.YarnClusterClient                       - Application failed. Diagnostics Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:32,164 WARN  org.apache.flink.yarn.YarnClusterClient                       - If log aggregation is activated in the Hadoop cluster, we recommend to retrieve the full application log using this command:

         yarn logs -applicationId application_1487247313588_0016

(It sometimes takes a few seconds until the logs are aggregated)

2017-02-17 15:52:32,164 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

 

 

发件人: Bruno Aranda [mailto:[hidden email]]
发送时间: 2017217 17:02
收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see:

 

The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

 

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

 

Cheers,

 

Bruno

 

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <[hidden email]> wrote:

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:510)

     ... 6 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:127)

     at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:645)

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:237)

     ... 21 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

     at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

     at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

     at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

     at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

     at scala.concurrent.Await$.result(package.scala:107)

     at scala.concurrent.Await.result(package.scala)

     at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:125)

     ... 23 more

2017-02-17 15:53:20,084 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:53:20,085 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:53:20,088 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:53:20,089 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0017 failed 1 times due to AM Container for appattempt_1487247313588_0017_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0017Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18733,containerID=container_1487247313588_0017_01_000001] is running beyond virtual memory limits. Current usage: 264.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0017_01_000001 :

     |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

     |- 18740 18733 18733 18733 (java) 955 64 <a href="tel:(229)%20893-3248" target="_blank">2298933248 67430 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

     |- 18733 18731 18733 18733 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0017/container_1487247313588_0017_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:53:20,102 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,106 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:20,107 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@10.199.202.162:43809/user/jobmanager with session ID null.

2017-02-17 15:53:20,108 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:53:20,112 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

Listening for transport dt_socket at address: 5006

2017-02-17 15:53:20,624 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:21,124 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:21,645 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:22,145 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:23,664 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@10.199.202.162:43809/user/jobmanager.

2017-02-17 15:53:24,185 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:53:25,204 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

 

The main error is : org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gatewayMay be It have some relationship with https://issues.apache.org/jira/browse/FLINK-2821. It is said that IP will always take place in akka address, but not hostnames. But I find hostname in akka address in leaderRetrievalService.

 

This problem won’t appear in 1.1.4.

 

Thank you all.

 

Howard

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

 

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.
Reply | Threaded
Open this post in threaded view
|

Re: Can't run flink on yarn on version 1.2.0

elmosca
Hi,

Good you found a solution, but are you sure it is the JDK version?

We are running Flink 1.2.0 on Yarn on an AWS EMR Cluster with no issues, using JDK 8 (1.8.0_121).

Cheers,

Bruno

On Thu, 23 Feb 2017 at 09:26 Howard,Li(vip.com) <[hidden email]> wrote:

Hi All:

         We finally find out the problem.

         The Flink on Yarn only works on JDK7, but not JDK8. If you use JDK8, you may meet the problem discussed before.

         For more information: OS: CentOS 6.6. JDK7 version: 1.7.0u75 JDK8 version: 1.8.0u111.

        

         This problem may have some relationship with akka.

 

发件人: Till Rohrmann [mailto:[hidden email]]
发送时间: 2017217 18:33

收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

could you check whether the JobManager's actor system was bound to "vip-rc-vsubu.vclound.com:55926"? You should see that in the job manager logs. Furthermore, have you checked that you Yarn cluster nodes are actually reachable from the node where you start the Flink application? If so, the logs of the cli client as well as the JobManager logs (both on debug level) would be tremendously helpful.

 

Cheers,

Till

 

On Fri, Feb 17, 2017 at 10:41 AM, Howard,Li(vip.com) <[hidden email]> wrote:

Sorry for the confusion I made. I do copy the wrong log, but we do meet this problem on 1.2.0.

for version 1.1.4 however, we meet this in one cluster but not in another. We are still trying to figure out what happened.

 

The following is the log for 1.2.0 version:

 

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,803 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager count = 2

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    JobManager memory = 1024

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager memory = 1024

2017-02-17 15:51:37,827 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:51:38,672 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:51:38,685 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/WordCount.jar

2017-02-17 15:51:38,992 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/log4j.properties

2017-02-17 15:51:39,058 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/logback.xml

2017-02-17 15:51:39,085 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/lib

2017-02-17 15:51:39,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-dist_2.11-1.2.0.jar

2017-02-17 15:51:40,493 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-conf.yaml

2017-02-17 15:51:40,547 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:51:40,587 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:51:45,879 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0016

Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0016/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:51:46,704 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)

         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)

         at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)

         at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:412)

         at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)

         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

         at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)

         at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

         at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:498)

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)

         ... 13 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)

         at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

         ... 28 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.result(package.scala:190)

         at scala.concurrent.Await.result(package.scala)

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)

         ... 30 more

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:52:21,151 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:52:21,152 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" class="gmail_msg" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:21,160 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@...:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,163 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:21,164 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@...:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:21,684 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:22,174 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:22,704 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:23,194 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,214 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,725 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:25,234 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:26,254 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:27,274 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,294 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,744 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@...:55926/user/jobmanager.

2017-02-17 15:52:29,314 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:30,334 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:31,155 WARN  org.apache.flink.yarn.YarnClusterClient                       - Error while stopping YARN cluster.

java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.ready(package.scala:169)

         at scala.concurrent.Await.ready(package.scala)

         at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)

         at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)

         at org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

2017-02-17 15:52:31,156 INFO  org.apache.flink.yarn.YarnClusterClient                       - Deleting files in hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016

2017-02-17 15:52:31,354 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:32,163 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1487247313588_0016 finished with state FAILED and final state FAILED at 1487317906227

2017-02-17 15:52:32,163 WARN  org.apache.flink.yarn.YarnClusterClient                       - Application failed. Diagnostics Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" class="gmail_msg" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:32,164 WARN  org.apache.flink.yarn.YarnClusterClient                       - If log aggregation is activated in the Hadoop cluster, we recommend to retrieve the full application log using this command:

         yarn logs -applicationId application_1487247313588_0016

(It sometimes takes a few seconds until the logs are aggregated)

2017-02-17 15:52:32,164 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

 

 

发件人: Bruno Aranda [mailto:[hidden email]]
发送时间: 2017217 17:02
收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see:

 

The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

 

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

 

Cheers,

 

Bruno

 

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <[hidden email]> wrote:

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Meth

Reply | Threaded
Open this post in threaded view
|

Re: Can't run flink on yarn on version 1.2.0

rmetzger0
Hi,

were both JDKs from the same vendor? (say OpenJDK?) both installed "vanilla" from the package manager?
Java is usually pretty good with backwards compatibility.
I think this issue is caused by some other effects we are overseeing here.

On Thu, Feb 23, 2017 at 10:43 AM, Bruno Aranda <[hidden email]> wrote:
Hi,

Good you found a solution, but are you sure it is the JDK version?

We are running Flink 1.2.0 on Yarn on an AWS EMR Cluster with no issues, using JDK 8 (1.8.0_121).

Cheers,

Bruno

On Thu, 23 Feb 2017 at 09:26 Howard,Li(vip.com) <[hidden email]> wrote:

Hi All:

         We finally find out the problem.

         The Flink on Yarn only works on JDK7, but not JDK8. If you use JDK8, you may meet the problem discussed before.

         For more information: OS: CentOS 6.6. JDK7 version: 1.7.0u75 JDK8 version: 1.8.0u111.

        

         This problem may have some relationship with akka.

 

发件人: Till Rohrmann [mailto:[hidden email]]
发送时间: 2017217 18:33

收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

could you check whether the JobManager's actor system was bound to "vip-rc-vsubu.vclound.com:55926"? You should see that in the job manager logs. Furthermore, have you checked that you Yarn cluster nodes are actually reachable from the node where you start the Flink application? If so, the logs of the cli client as well as the JobManager logs (both on debug level) would be tremendously helpful.

 

Cheers,

Till

 

On Fri, Feb 17, 2017 at 10:41 AM, Howard,Li(vip.com) <[hidden email]> wrote:

Sorry for the confusion I made. I do copy the wrong log, but we do meet this problem on 1.2.0.

for version 1.1.4 however, we meet this in one cluster but not in another. We are still trying to figure out what happened.

 

The following is the log for 1.2.0 version:

 

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,775 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:51:37,803 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager count = 2

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    JobManager memory = 1024

2017-02-17 15:51:37,804 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -    TaskManager memory = 1024

2017-02-17 15:51:37,827 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:51:38,672 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:51:38,685 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/examples/batch/WordCount.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/WordCount.jar

2017-02-17 15:51:38,992 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/log4j.properties

2017-02-17 15:51:39,058 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/logback.xml

2017-02-17 15:51:39,085 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/lib

2017-02-17 15:51:39,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-dist_2.11-1.2.0.jar

2017-02-17 15:51:40,493 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.2.0/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016/flink-conf.yaml

2017-02-17 15:51:40,547 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0016

2017-02-17 15:51:40,585 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:51:40,587 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:51:45,879 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0016

Using address vip-rc-vsubu.vclound.com:55926 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0016/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:51:46,704 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:51:47,029 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)

         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)

         at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)

         at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:412)

         at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)

         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)

         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

         at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)

         at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

         at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:498)

         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)

         ... 13 more

Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)

         at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)

         at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

         ... 28 more

Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.result(package.scala:190)

         at scala.concurrent.Await.result(package.scala)

         at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)

         ... 30 more

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Sending shutdown request to the Application Master

2017-02-17 15:52:21,145 INFO  org.apache.flink.yarn.YarnClusterClient                       - Start application client.

2017-02-17 15:52:21,151 WARN  org.apache.flink.yarn.YarnClusterClient                       - YARN reported application state FAILED

2017-02-17 15:52:21,152 WARN  org.apache.flink.yarn.YarnClusterClient                       - Diagnostics: Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" class="m_-4120454232305343109gmail_msg" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:21,160 INFO  org.apache.flink.yarn.ApplicationClient                       - Notification about new leader address akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,163 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:21,164 INFO  org.apache.flink.yarn.ApplicationClient                       - Received address of new leader akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager with session ID null.

2017-02-17 15:52:21,165 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

2017-02-17 15:52:21,168 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:21,684 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:22,174 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:22,704 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:23,194 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,214 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:24,725 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:25,234 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:26,254 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:27,274 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,294 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:28,744 INFO  org.apache.flink.yarn.ApplicationClient                       - Trying to register at JobManager akka.tcp://flink@vip-rc-vsubu.vclound.com:55926/user/jobmanager.

2017-02-17 15:52:29,314 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:30,334 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:31,155 WARN  org.apache.flink.yarn.YarnClusterClient                       - Error while stopping YARN cluster.

java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)

         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

         at scala.concurrent.Await$.ready(package.scala:169)

         at scala.concurrent.Await.ready(package.scala)

         at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372)

         at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342)

         at org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208)

         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:263)

         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)

         at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)

         at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)

         at java.security.AccessController.doPrivileged(Native Method)

         at javax.security.auth.Subject.doAs(Subject.java:422)

         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)

2017-02-17 15:52:31,156 INFO  org.apache.flink.yarn.YarnClusterClient                       - Deleting files in hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0016

2017-02-17 15:52:31,354 INFO  org.apache.flink.yarn.ApplicationClient                       - Sending StopCluster request to JobManager.

2017-02-17 15:52:32,163 INFO  org.apache.flink.yarn.YarnClusterClient                       - Application application_1487247313588_0016 finished with state FAILED and final state FAILED at 1487317906227

2017-02-17 15:52:32,163 WARN  org.apache.flink.yarn.YarnClusterClient                       - Application failed. Diagnostics Application application_1487247313588_0016 failed 1 times due to AM Container for appattempt_1487247313588_0016_000001 exited with  exitCode: -103

For more detailed output, check application tracking page:http://vip-rc-ucsww.vclound.com:8088/cluster/app/application_1487247313588_0016Then, click on links to logs of each attempt.

Diagnostics: Container [pid=18590,containerID=container_1487247313588_0016_01_000001] is running beyond virtual memory limits. Current usage: 266.1 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_1487247313588_0016_01_000001 :

         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

         |- 18598 18590 18590 18590 (java) 894 48 <a href="tel:(229)%20411-6352" class="m_-4120454232305343109gmail_msg" target="_blank">2294116352 67782 /home/software/jdk1.8.0_111/bin/java -Xmx424M -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner

         |- 18590 18588 18590 18590 (bash) 0 0 108605440 334 /bin/bash -c /home/software/jdk1.8.0_111/bin/java -Xmx424M  -Dlog.file=/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.log -Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner  1>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.out 2>/home/software/hadoop-2.7.3/logs/userlogs/application_1487247313588_0016/container_1487247313588_0016_01_000001/jobmanager.err

 

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

Failing this attempt. Failing the application.

2017-02-17 15:52:32,164 WARN  org.apache.flink.yarn.YarnClusterClient                       - If log aggregation is activated in the Hadoop cluster, we recommend to retrieve the full application log using this command:

         yarn logs -applicationId application_1487247313588_0016

(It sometimes takes a few seconds until the logs are aggregated)

2017-02-17 15:52:32,164 INFO  org.apache.flink.yarn.YarnClusterClient                       - YARN Client is shutting down

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Stopped Application client.

2017-02-17 15:52:32,267 INFO  org.apache.flink.yarn.ApplicationClient                       - Disconnect from JobManager null.

 

 

发件人: Bruno Aranda [mailto:[hidden email]]
发送时间: 2017217 17:02
收件人: [hidden email]
主题: Re: Can't run flink on yarn on version 1.2.0

 

Hi Howard,

 

We run Flink 1.2 in Yarn without issues. Sorry I don't have any specific solution, but are you sure you don't have some sort of Flink mix? In your logs I can see:

 

The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

 

Where it mentions 1.1.4 in the folder for the conf dir instead of 1.2.

 

Cheers,

 

Bruno

 

On Fri, 17 Feb 2017 at 08:50 Howard,Li(vip.com) <[hidden email]> wrote:

Hi,

         I’m trying to run flink on yarn by using command: bin/flink run -m yarn-cluster -yn 2 -ys 4 ./examples/batch/WordCount.jar

         But I got the following error:

 

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,746 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager count = 2

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         JobManager memory = 1024

2017-02-17 15:52:40,775 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   -         TaskManager memory = 1024

2017-02-17 15:52:40,796 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2017-02-17 15:52:41,680 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The configuration directory ('/home/software/flink-1.1.4/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.

2017-02-17 15:52:41,702 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/logback.xml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/logback.xml

2017-02-17 15:52:42,025 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/lib

2017-02-17 15:52:42,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/conf/log4j.properties to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/log4j.properties

2017-02-17 15:52:42,722 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/software/flink-1.1.4/lib/flink-dist_2.10-1.1.4.jar to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-dist_2.10-1.1.4.jar

2017-02-17 15:52:43,346 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/software/flink-1.1.4/conf/flink-conf.yaml to hdfs://10.199.202.161:9000/user/root/.flink/application_1487247313588_0017/flink-conf.yaml

2017-02-17 15:52:43,386 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1487247313588_0017

2017-02-17 15:52:43,425 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated

2017-02-17 15:52:43,427 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED

2017-02-17 15:52:48,471 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - YARN application has been deployed successfully.

Cluster started: Yarn cluster with application id application_1487247313588_0017

Using address 10.199.202.162:43809 to connect to JobManager.

JobManager web interface address http://vip-rc-ucsww.vclound.com:8088/proxy/application_1487247313588_0017/

Using the parallelism provided by the remote cluster (8). To use another parallelism, set it at the ./bin/flink client.

Starting execution of program

2017-02-17 15:52:49,278 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting program in interactive mode

Executing WordCount example with default input data set.

Use --input to specify file input.

Printing result to stdout. Use --output to specify output path.

2017-02-17 15:52:49,609 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected

Waiting until all TaskManagers have connected

2017-02-17 15:52:49,610 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.

 

------------------------------------------------------------

The program finished with the following exception:

 

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.

     at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)

     at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:404)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:321)

     at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:777)

     at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)

     at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1005)

     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1048)

Caused by: java.lang.RuntimeException: Unable to get ClusterClient status from Application Client

     at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)

     at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:514)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)

     at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:204)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:383)

     at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:370)

     at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)

     at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:896)

     at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)

     at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)

     at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:92)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Meth