Hi,
I am using flink single-job mode on YARN to read data from a kafka cluster installation configured for Kerberos. When i upgrade flink to 1.4.0 , the yarn application can not run normally and logs th error like this: Exception in thread "main" java.lang.RuntimeException: org.apache.flink.configuration.IllegalConfigurationException: Kerberos login configuration is invalid; keytab is unreadable at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:160) at org.apache.flink.yarn.YarnTaskManager$.main(YarnTaskManager.scala:65) at org.apache.flink.yarn.YarnTaskManager.main(YarnTaskManager.scala) Caused by: org.apache.flink.configuration.IllegalConfigurationException: Kerberos login configuration is invalid; keytab is unreadable at org.apache.flink.runtime.security.SecurityConfiguration.validate(SecurityConfiguration.java:139) at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:90) at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:71) at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:139 So i add some logs for the method "SecurityConfiguration.validate()" and rebuild the flink package. private void validate() { if (!StringUtils.isBlank(keytab)) { // principal is required if (StringUtils.isBlank(principal)) { throw new IllegalConfigurationException("Kerberos login configuration is invalid; keytab requires a principal."); } // check the keytab is readable File keytabFile = new File(keytab); if (!keytabFile.exists()) { throw new IllegalConfigurationException("WTF! keytabFile is not exist ! keytab:" + keytab); } if (!keytabFile.isFile()) { throw new IllegalConfigurationException("WTF! keytabFile is not file ! keytab:" + keytab); } if (!keytabFile.canRead()) { throw new IllegalConfigurationException("WTF! keytabFile is not readalbe ! keytab:" + keytab); } if (!keytabFile.exists() || !keytabFile.isFile() || !keytabFile.canRead()) { throw new IllegalConfigurationException("Kerberos login configuration is invalid; keytab is unreadable"); } } } After that , the yarn logs error like this : 017-12-15 17:14:36,314 INFO org.apache.flink.yarn.YarnTaskManagerRunner - localKeytabPath: /data1/yarn/nm/usercache/hadoop/appcache/application_1513310528578_0009/container_e05_1513310528578_0009_01_000002/krb5.keytab 2017-12-15 17:14:36,315 INFO org.apache.flink.yarn.YarnTaskManagerRunner - YARN daemon is running as: hadoop Yarn client user obtainer: hadoop 2017-12-15 17:14:36,315 INFO org.apache.flink.yarn.YarnTaskManagerRunner - ResourceID assigned for this container: container_e05_1513310528578_0009_01_000002 2017-12-15 17:14:36,321 ERROR org.apache.flink.yarn.YarnTaskManagerRunner - Exception occurred while launching Task Manager org.apache.flink.configuration.IllegalConfigurationException: WTF! keytabFile is not exist ! keytab:/data1/yarn/nm/usercache/hadoop/appcache/application_1513310528578_0009/container_e05_1513310528578_0009_01_000001/krb5.keytab at org.apache.flink.runtime.security.SecurityConfiguration.validate(SecurityConfiguration.java:140) at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:90) at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:71) at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:139) at org.apache.flink.yarn.YarnTaskManager$.main(YarnTaskManager.scala:65) at org.apache.flink.yarn.YarnTaskManager.main(YarnTaskManager.scala) These logs tell the "keytabFile" value is different from the "localKeytabPath". I searched the "org.apache.flink.yarn.YarnTaskManagerRunner" class source code and found there are something different betwee 1.3.2 and 1.4.0 1.3.2 //To support Yarn Secure Integration Test Scenario File krb5Conf = new File(currDir, Utils.KRB5_FILE_NAME); if (krb5Conf.exists() && krb5Conf.canRead()) { String krb5Path = krb5Conf.getAbsolutePath(); LOG.info("KRB5 Conf: {}", krb5Path); hadoopConfiguration = new org.apache.hadoop.conf.Configuration(); hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, "kerberos"); hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, "true"); } // set keytab principal and replace path with the local path of the shipped keytab file in NodeManager if (localKeytabPath != null && remoteKeytabPrincipal != null) { configuration.setString(SecurityOptions.KERBEROS_LOGIN_KEYTAB, localKeytabPath); configuration.setString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL, remoteKeytabPrincipal); } 1.4.0 //To support Yarn Secure Integration Test Scenario File krb5Conf = new File(currDir, Utils.KRB5_FILE_NAME); if (krb5Conf.exists() && krb5Conf.canRead()) { String krb5Path = krb5Conf.getAbsolutePath(); LOG.info("KRB5 Conf: {}", krb5Path); org.apache.hadoop.conf.Configuration hadoopConfiguration = new org.apache.hadoop.conf.Configuration(); hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, "kerberos"); hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, "true"); // set keytab principal and replace path with the local path of the shipped keytab file in NodeManager if (localKeytabPath != null && remoteKeytabPrincipal != null) { configuration.setString(SecurityOptions.KERBEROS_LOGIN_KEYTAB, localKeytabPath); configuration.setString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL, remoteKeytabPrincipal); } sc = new SecurityConfiguration(configuration, Collections.singletonList(securityConfig -> new HadoopModule(securityConfig, hadoopConfiguration))); } else { sc = new SecurityConfiguration(configuration); } In the previous version ,the "SecurityOptions.KERBEROS_LOGIN_KEYTAB" is always set the same with "localKeytabPath" but in 1.4.0 only if the "krb5Conf.exists() && krb5Conf.canRead()" retrun true . And in my test case ,it looks like the code only run the else default code。 Are there something i counld do to work around this problem ? Thanks! |
Hey 杨光,
thanks for looking into this in such a detail. Unfortunately, I'm not sure what the expected behaviour is (whether the change in behaviour was accidental or on purpose). Let me pull in Gordon who has worked quite a bit on the Kerberos related components in Flink. @Gordon: 1) Do you know what the expected behaviour is here? 2) How can he work around this issue in 1.4? – Ufuk On Fri, Dec 15, 2017 at 11:34 AM, 杨光 <[hidden email]> wrote: > Hi, > I am using flink single-job mode on YARN to read data from a kafka > cluster installation configured for Kerberos. When i upgrade flink to > 1.4.0 , the yarn application can not run normally and logs th error > like this: > > Exception in thread "main" java.lang.RuntimeException: > org.apache.flink.configuration.IllegalConfigurationException: Kerberos > login configuration is invalid; keytab is unreadable > at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:160) > at org.apache.flink.yarn.YarnTaskManager$.main(YarnTaskManager.scala:65) > at org.apache.flink.yarn.YarnTaskManager.main(YarnTaskManager.scala) > Caused by: org.apache.flink.configuration.IllegalConfigurationException: > Kerberos login configuration is invalid; keytab is unreadable > at org.apache.flink.runtime.security.SecurityConfiguration.validate(SecurityConfiguration.java:139) > at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:90) > at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:71) > at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:139 > > > So i add some logs for the method "SecurityConfiguration.validate()" > and rebuild the flink package. > > private void validate() { > if (!StringUtils.isBlank(keytab)) { > // principal is required > if (StringUtils.isBlank(principal)) { > throw new IllegalConfigurationException("Kerberos login > configuration is invalid; keytab requires a principal."); > } > > // check the keytab is readable > File keytabFile = new File(keytab); > > if (!keytabFile.exists()) { > throw new IllegalConfigurationException("WTF! keytabFile is > not exist ! keytab:" + keytab); > } > > if (!keytabFile.isFile()) { > throw new IllegalConfigurationException("WTF! keytabFile is > not file ! keytab:" + keytab); > } > > if (!keytabFile.canRead()) { > throw new IllegalConfigurationException("WTF! keytabFile is > not readalbe ! keytab:" + keytab); > } > > if (!keytabFile.exists() || !keytabFile.isFile() || > !keytabFile.canRead()) { > throw new IllegalConfigurationException("Kerberos login > configuration is invalid; keytab is unreadable"); > } > } > } > > After that , the yarn logs error like this : > 017-12-15 17:14:36,314 INFO > org.apache.flink.yarn.YarnTaskManagerRunner - > localKeytabPath: > /data1/yarn/nm/usercache/hadoop/appcache/application_1513310528578_0009/container_e05_1513310528578_0009_01_000002/krb5.keytab > 2017-12-15 17:14:36,315 INFO > org.apache.flink.yarn.YarnTaskManagerRunner - YARN > daemon is running as: hadoop Yarn client user obtainer: hadoop > 2017-12-15 17:14:36,315 INFO > org.apache.flink.yarn.YarnTaskManagerRunner - > ResourceID assigned for this container: > container_e05_1513310528578_0009_01_000002 > 2017-12-15 17:14:36,321 ERROR > org.apache.flink.yarn.YarnTaskManagerRunner - > Exception occurred while launching Task Manager > org.apache.flink.configuration.IllegalConfigurationException: WTF! > keytabFile is not exist ! > keytab:/data1/yarn/nm/usercache/hadoop/appcache/application_1513310528578_0009/container_e05_1513310528578_0009_01_000001/krb5.keytab > at org.apache.flink.runtime.security.SecurityConfiguration.validate(SecurityConfiguration.java:140) > at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:90) > at org.apache.flink.runtime.security.SecurityConfiguration.<init>(SecurityConfiguration.java:71) > at org.apache.flink.yarn.YarnTaskManagerRunner.runYarnTaskManager(YarnTaskManagerRunner.java:139) > at org.apache.flink.yarn.YarnTaskManager$.main(YarnTaskManager.scala:65) > at org.apache.flink.yarn.YarnTaskManager.main(YarnTaskManager.scala) > > > These logs tell the "keytabFile" value is different from the > "localKeytabPath". I searched the > "org.apache.flink.yarn.YarnTaskManagerRunner" class source code and > found there are > something different betwee 1.3.2 and 1.4.0 > > 1.3.2 > > //To support Yarn Secure Integration Test Scenario > File krb5Conf = new File(currDir, Utils.KRB5_FILE_NAME); > > if (krb5Conf.exists() && krb5Conf.canRead()) { > String krb5Path = krb5Conf.getAbsolutePath(); > LOG.info("KRB5 Conf: {}", krb5Path); > hadoopConfiguration = new org.apache.hadoop.conf.Configuration(); > hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, > "kerberos"); > hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, > "true"); > } > > // set keytab principal and replace path with the local path of the > shipped keytab file in NodeManager > if (localKeytabPath != null && remoteKeytabPrincipal != null) { > configuration.setString(SecurityOptions.KERBEROS_LOGIN_KEYTAB, > localKeytabPath); > configuration.setString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL, > remoteKeytabPrincipal); > } > > > 1.4.0 > > //To support Yarn Secure Integration Test Scenario > File krb5Conf = new File(currDir, Utils.KRB5_FILE_NAME); > > if (krb5Conf.exists() && krb5Conf.canRead()) { > String krb5Path = krb5Conf.getAbsolutePath(); > LOG.info("KRB5 Conf: {}", krb5Path); > org.apache.hadoop.conf.Configuration hadoopConfiguration = new > org.apache.hadoop.conf.Configuration(); > hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, > "kerberos"); > hadoopConfiguration.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, > "true"); > > // set keytab principal and replace path with the local path of the > shipped keytab file in NodeManager > if (localKeytabPath != null && remoteKeytabPrincipal != null) { > configuration.setString(SecurityOptions.KERBEROS_LOGIN_KEYTAB, > localKeytabPath); > configuration.setString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL, > remoteKeytabPrincipal); > } > > sc = new SecurityConfiguration(configuration, > Collections.singletonList(securityConfig -> new > HadoopModule(securityConfig, hadoopConfiguration))); > > } else { > sc = new SecurityConfiguration(configuration); > > } > > > > In the previous version ,the "SecurityOptions.KERBEROS_LOGIN_KEYTAB" > is always set the same with "localKeytabPath" but in 1.4.0 only if the > "krb5Conf.exists() && krb5Conf.canRead()" retrun true . And in my test > case ,it looks like the code only run the else default code。 > > > Are there something i counld do to work around this problem ? > > Thanks! |
Hi 杨光, Thanks a lot for reporting and looking into this with such detail! Your observations are correct: the changes from 1.3.2 to 1.4.0 in the YarnTaskManagerRunner caused the local Keytab path in TMs to not be correctly set. Unfortunately, AFAIK I don’t think there is a possible workaround to this for 1.4.0. Shipped Keytabs to TMs live in the working directory of the corresponding Yarn container, so the correct local path for the keytab cannot be known upfront. The only scenario that this would work is if all TM containers happen to be on the same NodeManager as the AM container. @Eron, This is a reoccurrence of FLINK-5580 [1], and as you speculated, the TM is using the wrong keytab path again because it was not properly set. I agree that the integration test scenario is best to not be in the main code. It actually seems to also be the cause of this issue this time. As you can see in [2], the change was only aiming to refactor the integration test scenario code block, but accidentally affected the keytab path setting. At the same time, we’ll need better unit test coverage for this, as apparently this can very easily break. I’ve filed a JIRA for this, with the comments so far included: FLINK-8270 [3] Will suggest this to be a blocker for 1.4.1 / 1.5.0. On 15 December 2017 at 4:12:24 PM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:
|
Free forum by Nabble | Edit this page |