Dear all,
i got some trouble during the start of Flink in a Yarn-Container based on Cloudera. I have a start script like that: slaxxxx:/applvg/home/flink/mvp $ cat run.sh export FLINK_HOME_DIR=/applvg/home/flink/mvp/flink-1.2.0/ export FLINK_JAR_DIR=/applvg/home/flink/mvp/cache export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf /applvg/home/flink/mvp/flink-1.2.0/bin/yarn-session.sh -n 4 -s 3 -st -jm 2048 -tm 2048 -qu root.mr-spark.avp -d If I execute this script it looks like following: sla09037:/applvg/home/flink/mvp $ ./run.sh 2017-05-11 15:13:24,541 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-05-11 15:13:24,542 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-05-11 15:13:24,542 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-05-11 15:13:24,571 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-05-11 15:13:25,000 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to [hidden email] (auth:KERBEROS) 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-05-11 15:13:25,050 INFO org.apache.flink.yarn.YarnClusterDescriptor - Using values: 2017-05-11 15:13:25,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager count = 4 2017-05-11 15:13:25,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - JobManager memory = 2048 2017-05-11 15:13:25,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager memory = 2048 2017-05-11 15:13:25,903 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-05-11 15:13:25,962 WARN org.apache.flink.yarn.YarnClusterDescriptor - The configuration directory ('/applvg/home/flink/mvp/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2017-05-11 15:13:25,972 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/lib to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/lib 2017-05-11 15:13:27,522 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/conf/log4j.properties to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/log4j.properties 2017-05-11 15:13:27,552 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/conf/logback.xml to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/logback.xml 2017-05-11 15:13:27,584 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/flink-dist_2.11-1.2.0.jar 2017-05-11 15:13:28,508 INFO org.apache.flink.yarn.Utils - Copying from /applvg/home/flink/mvp/flink-1.2.0/conf/flink-conf.yaml to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/flink-conf.yaml 2017-05-11 15:13:28,553 INFO org.apache.flink.yarn.YarnClusterDescriptor - Adding delegation token to the AM container.. 2017-05-11 15:13:28,563 INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 27247 for flink on ha-hdfs:nameservice1 Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:421) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: lfrar256.srv.company;lfrar257.srv.company at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationTokenService(KMSClientProvider.java:823) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:779) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2046) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.flink.yarn.Utils.setTokensFor(Utils.java:154) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:753) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:419) ... 9 more Caused by: java.net.UnknownHostException: lfrarXXX1.srv.company;lfrarXXX2.srv.company ... 20 more It seems that flink found these hosts here: slaxxxxx:/applvg/home/flink/mvp $ grep -r "lfrarXXX1.srv.company;lfrarXXX2.srv.company" /etc/hadoop/conf /etc/hadoop/conf/core-site.xml: <value>kms://[hidden email];lfrarXXX2.srv.company:16000/kms</value> /etc/hadoop/conf/hdfs-site.xml: <value>kms://[hidden email];lfrarXXX2.srv.company:16000/kms</value> So I guess that flink got this connectionstrings from the Cloudera-Config and "forget" to split it at the ";". So if i ping each of those everything is working. Maybe you have some hints to avoid this problem? Best wishes Dominiuqe |
Hi Dominique, I’m not exactly sure but this looks more like a Hadoop or a Hadoop configuration problem to me. Could it be that the Hadoop version you’re running does not support the specification of multiple KMS servers via Cheers, On Thu, May 11, 2017 at 4:06 PM, Dominique Rondé <[hidden email]> wrote: Dear all, |
Dominique: Which hadoop release are you using ? Please pastebin the classpath. Cheers On Thu, May 11, 2017 at 7:27 AM, Till Rohrmann <[hidden email]> wrote:
|
I meet the same problem and I'm using Hadoop 2.6.0-cdh5.7.1! thanks
|
Free forum by Nabble | Edit this page |