[Question] Failed to submit flink job to secure yarn cluster

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Question] Failed to submit flink job to secure yarn cluster

Ethan Li
Hello


I downloaded flink-1.9.1 and pre-bundled Hadoop 2.8.3 from https://flink.apache.org/downloads.html#apache-flink-191. I used default configs except:

security.kerberos.login.keytab: userA.keytab
security.kerberos.login.principal: userA@REALM


I have a secure Yarn cluster set up already. Then I ran “ ./bin/flink run -m yarn-cluster -p 1 -yjm 1024m -ytm 1024m ./examples/streaming/WordCount.jar” and got the following errors:


org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1578605412668_0005 to YARN : Failed to renew token: Kind: kms-dt, Service: host3.com:3456, Ident: (owner=userA, renewer=adminB, realUser=, issueDate=1578606224956, maxDate=1579211024956, sequenceNumber=32, masterKeyId=52)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:275)
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1004)
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
... 9 more


Hostname, IP address, username and etc. are anonymized.


Not sure how to proceed further. Wondering if anyone in the community has encountered this before. Thank you very much for your time!

Best,
Ethan

Reply | Threaded
Open this post in threaded view
|

Re: [Question] Failed to submit flink job to secure yarn cluster

Yangze Guo
Hi, Ethan

You could first check your cluster following this guide and check if
all the related config[2] set correctly.

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-kerberos.html
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#security-kerberos-login-contexts

Best,
Yangze Guo

On Fri, Jan 10, 2020 at 10:37 AM Ethan Li <[hidden email]> wrote:

>
> Hello
>
> I was following  https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#run-a-flink-job-on-yarn and trying to submit a flink job on yarn.
>
> I downloaded flink-1.9.1 and pre-bundled Hadoop 2.8.3 from https://flink.apache.org/downloads.html#apache-flink-191. I used default configs except:
>
> security.kerberos.login.keytab: userA.keytab
> security.kerberos.login.principal: userA@REALM
>
>
> I have a secure Yarn cluster set up already. Then I ran “ ./bin/flink run -m yarn-cluster -p 1 -yjm 1024m -ytm 1024m ./examples/streaming/WordCount.jar” and got the following errors:
>
>
> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
> at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
> at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
> at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1578605412668_0005 to YARN : Failed to renew token: Kind: kms-dt, Service: host3.com:3456, Ident: (owner=userA, renewer=adminB, realUser=, issueDate=1578606224956, maxDate=1579211024956, sequenceNumber=32, masterKeyId=52)
> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:275)
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1004)
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
> ... 9 more
>
>
> Full client log:https://gist.github.com/Ethanlm/221284bcaa272270a799957dc05b94fd
> Resource manager log: https://gist.github.com/Ethanlm/ecd0a3eb25582ad6b1552927fc0e5c47
> Hostname, IP address, username and etc. are anonymized.
>
>
> Not sure how to proceed further. Wondering if anyone in the community has encountered this before. Thank you very much for your time!
>
> Best,
> Ethan
>
Reply | Threaded
Open this post in threaded view
|

Re: [Question] Failed to submit flink job to secure yarn cluster

Ethan Li
Hi Yangze,

Thanks for your reply. Those are the docs I have read and followed. (I was also able to set up a standalone flink cluster with secure HDFS, Zookeeper and Kafa. )

Could you please let me know what I am missing? Thanks


Best,
Ethan

> On Jan 10, 2020, at 6:28 AM, Yangze Guo <[hidden email]> wrote:
>
> Hi, Ethan
>
> You could first check your cluster following this guide and check if
> all the related config[2] set correctly.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-kerberos.html
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#security-kerberos-login-contexts
>
> Best,
> Yangze Guo
>
> On Fri, Jan 10, 2020 at 10:37 AM Ethan Li <[hidden email]> wrote:
>>
>> Hello
>>
>> I was following  https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#run-a-flink-job-on-yarn and trying to submit a flink job on yarn.
>>
>> I downloaded flink-1.9.1 and pre-bundled Hadoop 2.8.3 from https://flink.apache.org/downloads.html#apache-flink-191. I used default configs except:
>>
>> security.kerberos.login.keytab: userA.keytab
>> security.kerberos.login.principal: userA@REALM
>>
>>
>> I have a secure Yarn cluster set up already. Then I ran “ ./bin/flink run -m yarn-cluster -p 1 -yjm 1024m -ytm 1024m ./examples/streaming/WordCount.jar” and got the following errors:
>>
>>
>> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
>> at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>> at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>> at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1578605412668_0005 to YARN : Failed to renew token: Kind: kms-dt, Service: host3.com:3456, Ident: (owner=userA, renewer=adminB, realUser=, issueDate=1578606224956, maxDate=1579211024956, sequenceNumber=32, masterKeyId=52)
>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:275)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1004)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
>> ... 9 more
>>
>>
>> Full client log:https://gist.github.com/Ethanlm/221284bcaa272270a799957dc05b94fd
>> Resource manager log: https://gist.github.com/Ethanlm/ecd0a3eb25582ad6b1552927fc0e5c47
>> Hostname, IP address, username and etc. are anonymized.
>>
>>
>> Not sure how to proceed further. Wondering if anyone in the community has encountered this before. Thank you very much for your time!
>>
>> Best,
>> Ethan
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Question] Failed to submit flink job to secure yarn cluster

Yang Wang
I am not familiar with kerberos. However i find "keyProvider null cannot renew token" in the Yarn
ResourceManager logs. Could you please check the key provider has been configured correctly?


Best,
Yang

Ethan Li <[hidden email]> 于2020年1月10日周五 下午10:54写道:
Hi Yangze,

Thanks for your reply. Those are the docs I have read and followed. (I was also able to set up a standalone flink cluster with secure HDFS, Zookeeper and Kafa. )

Could you please let me know what I am missing? Thanks


Best,
Ethan

> On Jan 10, 2020, at 6:28 AM, Yangze Guo <[hidden email]> wrote:
>
> Hi, Ethan
>
> You could first check your cluster following this guide and check if
> all the related config[2] set correctly.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-kerberos.html
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#security-kerberos-login-contexts
>
> Best,
> Yangze Guo
>
> On Fri, Jan 10, 2020 at 10:37 AM Ethan Li <[hidden email]> wrote:
>>
>> Hello
>>
>> I was following  https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#run-a-flink-job-on-yarn and trying to submit a flink job on yarn.
>>
>> I downloaded flink-1.9.1 and pre-bundled Hadoop 2.8.3 from https://flink.apache.org/downloads.html#apache-flink-191. I used default configs except:
>>
>> security.kerberos.login.keytab: userA.keytab
>> security.kerberos.login.principal: userA@REALM
>>
>>
>> I have a secure Yarn cluster set up already. Then I ran “ ./bin/flink run -m yarn-cluster -p 1 -yjm 1024m -ytm 1024m ./examples/streaming/WordCount.jar” and got the following errors:
>>
>>
>> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
>> at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>> at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>> at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1578605412668_0005 to YARN : Failed to renew token: Kind: kms-dt, Service: host3.com:3456, Ident: (owner=userA, renewer=adminB, realUser=, issueDate=1578606224956, maxDate=1579211024956, sequenceNumber=32, masterKeyId=52)
>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:275)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1004)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
>> ... 9 more
>>
>>
>> Full client log:https://gist.github.com/Ethanlm/221284bcaa272270a799957dc05b94fd
>> Resource manager log: https://gist.github.com/Ethanlm/ecd0a3eb25582ad6b1552927fc0e5c47
>> Hostname, IP address, username and etc. are anonymized.
>>
>>
>> Not sure how to proceed further. Wondering if anyone in the community has encountered this before. Thank you very much for your time!
>>
>> Best,
>> Ethan
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Question] Failed to submit flink job to secure yarn cluster

Ethan Li
Sorry forgot to update on this. 

I figured it out. KMS is not set up correctly in my environment. ResourceManager is also missing key provider config.  PE is fixing it.  

Thanks for looking into this

Ethan Li

On Jan 13, 2020, at 21:38, Yang Wang <[hidden email]> wrote:


I am not familiar with kerberos. However i find "keyProvider null cannot renew token" in the Yarn
ResourceManager logs. Could you please check the key provider has been configured correctly?


Best,
Yang

Ethan Li <[hidden email]> 于2020年1月10日周五 下午10:54写道:
Hi Yangze,

Thanks for your reply. Those are the docs I have read and followed. (I was also able to set up a standalone flink cluster with secure HDFS, Zookeeper and Kafa. )

Could you please let me know what I am missing? Thanks


Best,
Ethan

> On Jan 10, 2020, at 6:28 AM, Yangze Guo <[hidden email]> wrote:
>
> Hi, Ethan
>
> You could first check your cluster following this guide and check if
> all the related config[2] set correctly.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/security-kerberos.html
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#security-kerberos-login-contexts
>
> Best,
> Yangze Guo
>
> On Fri, Jan 10, 2020 at 10:37 AM Ethan Li <[hidden email]> wrote:
>>
>> Hello
>>
>> I was following  https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/yarn_setup.html#run-a-flink-job-on-yarn and trying to submit a flink job on yarn.
>>
>> I downloaded flink-1.9.1 and pre-bundled Hadoop 2.8.3 from https://flink.apache.org/downloads.html#apache-flink-191. I used default configs except:
>>
>> security.kerberos.login.keytab: userA.keytab
>> security.kerberos.login.principal: userA@REALM
>>
>>
>> I have a secure Yarn cluster set up already. Then I ran “ ./bin/flink run -m yarn-cluster -p 1 -yjm 1024m -ytm 1024m ./examples/streaming/WordCount.jar” and got the following errors:
>>
>>
>> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn session cluster
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
>> at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>> at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>> at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1578605412668_0005 to YARN : Failed to renew token: Kind: kms-dt, Service: host3.com:3456, Ident: (owner=userA, renewer=adminB, realUser=, issueDate=1578606224956, maxDate=1579211024956, sequenceNumber=32, masterKeyId=52)
>> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:275)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1004)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
>> at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
>> ... 9 more
>>
>>
>> Full client log:https://gist.github.com/Ethanlm/221284bcaa272270a799957dc05b94fd
>> Resource manager log: https://gist.github.com/Ethanlm/ecd0a3eb25582ad6b1552927fc0e5c47
>> Hostname, IP address, username and etc. are anonymized.
>>
>>
>> Not sure how to proceed further. Wondering if anyone in the community has encountered this before. Thank you very much for your time!
>>
>> Best,
>> Ethan
>>