Hi, This issue is fixed at version 1.3.0. But I have some reason to trying to find out work around. I did, 1. change source according to https://github.com/apache/flink/commit/eef85e095a8a0e4c4553631b74ba7b9f173cebf0 2. replace $FLINK_HOME/lib/flink-dist_2.11-1.2.0.jar 3. set flink-conf.yaml "zookeeper.sasl.disable: true" 4. run yarn-session.sh Original problem-Authentication failed- seems to be passed. But I got this error, Exception in thread "main" java.lang.RuntimeException: Failed to retrieve JobManager address at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:627) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader address and leader session ID. at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175) at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242) ... 9 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at scala.concurrent.Await.result(package.scala) at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173) ... 10 more I believe related setting(flink, hadoop, zookeeper) is correct. Because yarn-session works smoothly with flink 1.3.2 in same environment. Does anyone have any inspiration for this error message? Thanks. ᐧ
|
I looked at the commit you cherry-picked and nothing in there explains the
error you got. This rather sounds like something might be mixed up between (remaining artefacts of) flink 1.3 and 1.2. Can you verify that nothing of your flink 1.3 tests remains, e.g. running JobManager or TaskManager instances? Also that you're not accidentally running the yarn-session.sh script of 1.3? Nico On Wednesday, 6 September 2017 06:36:42 CEST Sunny Yun wrote: > Hi, > > Using flink 1.2.0, I faced to issue > https://issues.apache.org/jira/browse/FLINK-6117 > https://issues.apache.org/jira/browse/FLINK-6117. > This issue is fixed at version 1.3.0. But I have some reason to trying to > find out work around. > > I did, > 1. change source according to > https://github.com/apache/flink/commit/eef85e095a8a0e4c4553631b74ba7b9f173ce > bf0 2. replace $FLINK_HOME/lib/flink-dist_2.11-1.2.0.jar > 3. set flink-conf.yaml "zookeeper.sasl.disable: true" > 4. run yarn-session.sh > > > Original problem-Authentication failed- seems to be passed. > But I got this error, > > Exception in thread "main" java.lang.RuntimeException: Failed to retrieve > JobManager address > at > org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl > ient.java:248) at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:6 > 27) at > org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav > a:476) at > org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav > a:473) at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurity > Context.java:43) at java.security.AccessController.doPrivileged(Native > Method) at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja > va:1656) at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSec > urityContext.java:40) at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java: > 473) Caused by: > org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could > not retrieve the leader address and leader session ID. > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI > nfo(LeaderRetrievalUtils.java:175) at > org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl > ient.java:242) ... 9 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [60000 milliseconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scal > a:53) at scala.concurrent.Await$.result(package.scala:190) > at scala.concurrent.Await.result(package.scala) > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI > nfo(LeaderRetrievalUtils.java:173) ... 10 more > > > I believe related setting(flink, hadoop, zookeeper) is correct. Because > yarn-session works smoothly with flink 1.3.2 in same environment. > > Does anyone have any inspiration for this error message? > > Thanks. > > ᐧ signature.asc (201 bytes) Download Attachment |
Nico, thank you for your reply.
I looked at the commit you cherry-picked and nothing in there explains the error you got. ==> The commit I cherry-picked makes setting of 'zookeeper.sasl.disable' work correctly. I changed flink-dist_2.11-1.2.0.jar according to it. So now zookeeper.sasl problem is gone. Yes, the error log I posted in the original message is completely different one. Can you verify that nothing of your flink 1.3 tests remains ==> Below is what I just reproduced. I have 4 nodes cluster with non-secure. After run yarn-session.sh, JM process be created in flink-03 node but TM process not. Standalone works well. Any clue would be really appreciate. Thanks. [bistel@flink-01 ~]$ jps 1888 ResourceManager 2000 NodeManager 2433 NameNode 2546 DataNode 2754 SecondaryNameNode 2891 Jps 1724 QuorumPeerMain [bistel@flink-02 ~]$ jps 2018 Jps 1721 NodeManager 1881 DataNode 1515 QuorumPeerMain [bistel@flink-03 ~]$ jps 1521 QuorumPeerMain 1975 Jps 1724 NodeManager 1885 DataNode [bistel@flink-04 ~]$ jps 2090 Jps 1515 QuorumPeerMain 1789 NodeManager 1950 DataNode [bistel@flink-01 ~]$ /usr/local/flink-1.2.0/bin/yarn-session.sh -n 4 2017-09-07 09:49:35,467 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-01 2017-09-07 09:49:35,468 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-09-07 09:49:35,468 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 4096 2017-09-07 09:49:35,468 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 8192 2017-09-07 09:49:35,468 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-09-07 09:49:35,469 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-09-07 09:49:35,469 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 4 2017-09-07 09:49:35,469 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-09-07 09:49:35,469 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /usr/local/hadoop/etc/hadoop/ 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability, zookeeper 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.quorum, flink-01:2181,flink-02:2181,flink-03:2181,flink-04:2181 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.root, /flink 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.namespace, /cluster_one 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.storageDir, hdfs:///flink/recovery 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-09-07 09:49:35,470 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.containers.vcores, 20 2017-09-07 09:49:35,471 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-master.env.LD_LIBRARY_PATH, /opt/tibco/TIBRV/8.0/lib 2017-09-07 09:49:35,471 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.taskmanager.env.LD_LIBRARY_PATH, /opt/tibco/TIBRV/8.0/lib 2017-09-07 09:49:35,471 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: zookeeper.sasl.disable, true 2017-09-07 09:49:35,662 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-01 2017-09-07 09:49:35,662 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-09-07 09:49:35,662 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 4096 2017-09-07 09:49:35,663 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 8192 2017-09-07 09:49:35,663 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-09-07 09:49:35,663 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-09-07 09:49:35,663 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 4 2017-09-07 09:49:35,663 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-09-07 09:49:35,663 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /usr/local/hadoop/etc/hadoop/ 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability, zookeeper 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.quorum, flink-01:2181,flink-02:2181,flink-03:2181,flink-04:2181 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.root, /flink 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.namespace, /cluster_one 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.storageDir, hdfs:///flink/recovery 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.containers.vcores, 20 2017-09-07 09:49:35,664 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-master.env.LD_LIBRARY_PATH, /opt/tibco/TIBRV/8.0/lib 2017-09-07 09:49:35,665 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.taskmanager.env.LD_LIBRARY_PATH, /opt/tibco/TIBRV/8.0/lib 2017-09-07 09:49:35,665 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: zookeeper.sasl.disable, true 2017-09-07 09:49:36,519 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-09-07 09:49:36,779 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to bistel (auth:SIMPLE) 2017-09-07 09:49:37,084 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, flink-01 2017-09-07 09:49:37,084 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-09-07 09:49:37,084 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 4096 2017-09-07 09:49:37,084 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 8192 2017-09-07 09:49:37,084 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2017-09-07 09:49:37,084 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 4 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /usr/local/hadoop/etc/hadoop/ 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability, zookeeper 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.quorum, flink-01:2181,flink-02:2181,flink-03:2181,flink-04:2181 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.root, /flink 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.namespace, /cluster_one 2017-09-07 09:49:37,085 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.storageDir, hdfs:///flink/recovery 2017-09-07 09:49:37,086 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-attempts, 10 2017-09-07 09:49:37,086 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.containers.vcores, 20 2017-09-07 09:49:37,086 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.application-master.env.LD_LIBRARY_PATH, /opt/tibco/TIBRV/8.0/lib 2017-09-07 09:49:37,086 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: yarn.taskmanager.env.LD_LIBRARY_PATH, /opt/tibco/TIBRV/8.0/lib 2017-09-07 09:49:37,086 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: zookeeper.sasl.disable, true 2017-09-07 09:49:37,103 INFO org.apache.flink.yarn.YarnClusterDescriptor - Using values: 2017-09-07 09:49:37,103 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager count = 4 2017-09-07 09:49:37,103 INFO org.apache.flink.yarn.YarnClusterDescriptor - JobManager memory = 1024 2017-09-07 09:49:37,103 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager memory = 1024 2017-09-07 09:49:37,118 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at flink-01/10.1.0.4:8032 2017-09-07 09:49:39,084 INFO org.apache.flink.yarn.Utils - Copying from file:/usr/local/flink-1.2.0/lib to hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/lib 2017-09-07 09:49:43,419 INFO org.apache.flink.yarn.Utils - Copying from file:/usr/local/flink-1.2.0/conf/log4j.properties to hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/log4j.properties 2017-09-07 09:49:43,552 INFO org.apache.flink.yarn.Utils - Copying from file:/usr/local/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/flink-dist_2.11-1.2.0.jar 2017-09-07 09:49:43,816 INFO org.apache.flink.yarn.Utils - Copying from /usr/local/flink-1.2.0/conf/flink-conf.yaml to hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001/flink-conf.yaml 2017-09-07 09:49:43,903 INFO org.apache.flink.yarn.YarnClusterDescriptor - Submitting application master application_1504745288687_0001 2017-09-07 09:49:44,011 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1504745288687_0001 2017-09-07 09:49:44,011 INFO org.apache.flink.yarn.YarnClusterDescriptor - Waiting for the cluster to be allocated 2017-09-07 09:49:44,030 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2017-09-07 09:49:50,326 INFO org.apache.flink.yarn.YarnClusterDescriptor - YARN application has been deployed successfully. Exception in thread "main" java.lang.RuntimeException: Failed to retrieve JobManager address at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:248) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:627) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader address and leader session ID. at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:175) at org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterClient.java:242) ... 9 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at scala.concurrent.Await.result(package.scala) at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:173) ... 10 more 2017-09-07 09:50:51,519 INFO org.apache.flink.yarn.YarnClusterClient - Shutting down YarnClusterClient from the client shutdown hook 2017-09-07 09:50:51,519 INFO org.apache.flink.yarn.YarnClusterClient - Sending shutdown request to the Application Master 2017-09-07 09:50:51,549 INFO org.apache.flink.yarn.YarnClusterClient - Start application client. 2017-09-07 09:50:51,549 INFO org.apache.flink.yarn.YarnClusterClient - Starting client actor system. 2017-09-07 09:50:51,807 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2017-09-07 09:50:51,836 INFO Remoting - Starting remoting 2017-09-07 09:50:51,936 INFO Remoting - Remoting started; listening on addresses :[akka.tcp://flink@flink-01:45463] 2017-09-07 09:50:51,954 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:52,967 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:53,986 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:55,007 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:56,026 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:57,047 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:58,067 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:50:59,087 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:51:00,107 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:51:01,127 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:51:01,944 WARN org.apache.flink.yarn.YarnClusterClient - Error while stopping YARN cluster. java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.ready(package.scala:169) at scala.concurrent.Await.ready(package.scala) at org.apache.flink.yarn.YarnClusterClient.shutdownCluster(YarnClusterClient.java:372) at org.apache.flink.yarn.YarnClusterClient.finalizeCluster(YarnClusterClient.java:342) at org.apache.flink.client.program.ClusterClient.shutdown(ClusterClient.java:208) at org.apache.flink.yarn.YarnClusterClient$ClientShutdownHook.run(YarnClusterClient.java:446) 2017-09-07 09:51:01,946 INFO org.apache.flink.yarn.YarnClusterClient - Deleted Yarn properties file at /tmp/.yarn-properties-bistel 2017-09-07 09:51:01,946 INFO org.apache.flink.yarn.YarnClusterClient - Deleting files in hdfs://flink-01:9000/user/bistel/.flink/application_1504745288687_0001 2017-09-07 09:51:02,146 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager. 2017-09-07 09:51:02,490 INFO org.apache.flink.yarn.YarnClusterClient - Application application_1504745288687_0001 finished with state RUNNING and final state UNDEFINED at 0 2017-09-07 09:51:02,490 INFO org.apache.flink.yarn.YarnClusterClient - YARN Client is shutting down 2017-09-07 09:51:02,598 INFO org.apache.flink.yarn.ApplicationClient - Stopped Application client. 2017-09-07 09:51:02,599 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager null. 2017-09-07 09:51:02,633 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon. 2017-09-07 09:51:02,635 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports. 2017-09-07 09:51:02,651 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. [bistel@flink-01 ~]$ jps 1888 ResourceManager 2000 NodeManager 2433 NameNode 2546 DataNode 2754 SecondaryNameNode 3143 Jps 1724 QuorumPeerMain [bistel@flink-02 ~]$ jps 2018 Jps 1721 NodeManager 1881 DataNode 1515 QuorumPeerMain [bistel@flink-03 ~]$ jps 1521 QuorumPeerMain 2054 YarnApplicationMasterRunner 1724 NodeManager 1885 DataNode 2142 Jps [bistel@flink-04 ~]$ jps 2090 Jps 1515 QuorumPeerMain 1789 NodeManager 1950 DataNode Nico Kruber wrote > I looked at the commit you cherry-picked and nothing in there explains the > error you got. This rather sounds like something might be mixed up between > (remaining artefacts of) flink 1.3 and 1.2. > > Can you verify that nothing of your flink 1.3 tests remains, e.g. running > JobManager or TaskManager instances? Also that you're not accidentally > running > the yarn-session.sh script of 1.3? > > > Nico > > On Wednesday, 6 September 2017 06:36:42 CEST Sunny Yun wrote: >> Hi, >> >> Using flink 1.2.0, I faced to issue >> https://issues.apache.org/jira/browse/FLINK-6117 >> https://issues.apache.org/jira/browse/FLINK-6117. >> This issue is fixed at version 1.3.0. But I have some reason to trying to >> find out work around. >> >> I did, >> 1. change source according to >> https://github.com/apache/flink/commit/eef85e095a8a0e4c4553631b74ba7b9f173ce >> bf0 2. replace $FLINK_HOME/lib/flink-dist_2.11-1.2.0.jar >> 3. set flink-conf.yaml "zookeeper.sasl.disable: true" >> 4. run yarn-session.sh >> >> >> Original problem-Authentication failed- seems to be passed. >> But I got this error, >> >> Exception in thread "main" java.lang.RuntimeException: Failed to retrieve >> JobManager address >> at >> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl >> ient.java:248) at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:6 >> 27) at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav >> a:476) at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.jav >> a:473) at >> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurity >> Context.java:43) at java.security.AccessController.doPrivileged(Native >> Method) at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja >> va:1656) at >> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSec >> urityContext.java:40) at >> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java: >> 473) Caused by: >> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could >> not retrieve the leader address and leader session ID. >> at >> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI >> nfo(LeaderRetrievalUtils.java:175) at >> org.apache.flink.client.program.ClusterClient.getJobManagerAddress(ClusterCl >> ient.java:242) ... 9 more >> Caused by: java.util.concurrent.TimeoutException: Futures timed out after >> [60000 milliseconds] >> at >> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >> at >> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >> at >> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) >> at >> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scal >> a:53) at scala.concurrent.Await$.result(package.scala:190) >> at scala.concurrent.Await.result(package.scala) >> at >> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionI >> nfo(LeaderRetrievalUtils.java:173) ... 10 more >> >> >> I believe related setting(flink, hadoop, zookeeper) is correct. Because >> yarn-session works smoothly with flink 1.3.2 in same environment. >> >> Does anyone have any inspiration for this error message? >> >> Thanks. >> >> ᐧ > > > > signature.asc (201 bytes) > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/attachment/15426/0/signature.asc> -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |