Fink application failing with kerberos issue after running successfully without any issues for few days

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Fink application failing with kerberos issue after running successfully without any issues for few days

Raja.Aravapalli

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

Reply | Threaded
Open this post in threaded view
|

Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Ted Yu
What are the values for the following parameters ?

dfs.namenode.delegation.token.max-lifetime

dfs.namenode.delegation.token.renew-interval

Cheers

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 


Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Raja.Aravapalli

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Ted Yu
Can you try shortening renewal interval to something like 28800000 ?

Cheers

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 


Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Raja.Aravapalli

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Ted Yu
I think this needs to be done by the admin.

On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli <[hidden email]> wrote:

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

 


Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Eron Wright
Raja,
According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days.   You say that the job ran without issue for 'a few days'.  Can we conclude that the job hit the 7-day DT expiration?

Flink supports the use of Kerberos keytabs as an alternative to delegation tokens for exactly this reason, that delegation tokens eventually expire and so aren't useful to a long-running program.   Consider making use of keytabs here.

Hope this helps!
-Eron


On Thu, Aug 17, 2017 at 9:58 AM, Ted Yu <[hidden email]> wrote:
I think this needs to be done by the admin.

On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli <[hidden email]> wrote:

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

 



Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Raja.Aravapalli

 

Thanks a lot Eron…

 

If I am understanding you correct, you suggest using keytabs to launch streaming applications!

 

Can you please confirm if I have to use the below settings to ensure I use keytabs?

 

  • security.kerberos.login.use-ticket-cache:

Indicates whether to read from your Kerberos ticket cache (default: true).

 

  • security.kerberos.login.keytab:

Absolute path to a Kerberos keytab file that contains the user credentials.

 

  • security.kerberos.login.principal:

Kerberos principal name associated with the keytab.

 

  • security.kerberos.login.contexts: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, Client,KafkaClient to use the credentials for ZooKeeper authentication and for Kafka authentication).

 

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#kerberos-based-security-1

 

 

Also a quick question, once I make these changes to use keytabs instead of ticket cache, Is there any place in the logs I can check, were the setting I made are in use and the applications are not actually using again ticket cache again?

 

Thanks a lot, in advance.

 

 

Regards,

Raja.

 

From: Eron Wright <[hidden email]>
Date: Thursday, August 17, 2017 at 1:06 PM
To: Ted Yu <[hidden email]>
Cc: Raja Aravapalli <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Raja,

According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days.   You say that the job ran without issue for 'a few days'.  Can we conclude that the job hit the 7-day DT expiration?

 

Flink supports the use of Kerberos keytabs as an alternative to delegation tokens for exactly this reason, that delegation tokens eventually expire and so aren't useful to a long-running program.   Consider making use of keytabs here.

 

Hope this helps!

-Eron

 

 

On Thu, Aug 17, 2017 at 9:58 AM, Ted Yu <[hidden email]> wrote:

I think this needs to be done by the admin.

 

On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli <[hidden email]> wrote:

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

vprabhu@gmail.com
In reply to this post by Eron Wright
+1 on the 7 day expiry explanation,

This is most likely the cause.

I faced the 7 day expiry issue with a previous version of flink that dint support keytabs, I am currently running flink-1.3 with keytabs (it has been going okay for 2 days now), I will update after the 7 day mark.

Thanks,
Prabhu

On Thu, Aug 17, 2017 at 11:06 AM, Eron Wright <[hidden email]> wrote:
Raja,
According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days.   You say that the job ran without issue for 'a few days'.  Can we conclude that the job hit the 7-day DT expiration?

Flink supports the use of Kerberos keytabs as an alternative to delegation tokens for exactly this reason, that delegation tokens eventually expire and so aren't useful to a long-running program.   Consider making use of keytabs here.

Hope this helps!
-Eron


On Thu, Aug 17, 2017 at 9:58 AM, Ted Yu <[hidden email]> wrote:
I think this needs to be done by the admin.

On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli <[hidden email]> wrote:

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

 




Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Tzu-Li (Gordon) Tai
In reply to this post by Raja.Aravapalli
Hi Raja,

Can you please confirm if I have to use the below settings to ensure I use keytabs?

 

  • security.kerberos.login.use-ticket-cache:

Indicates whether to read from your Kerberos ticket cache (default: true).

 

  • security.kerberos.login.keytab:

Absolute path to a Kerberos keytab file that contains the user credentials.

 

  • security.kerberos.login.principal:

Kerberos principal name associated with the keytab.

 

  • security.kerberos.login.contexts: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, Client,KafkaClient to use the credentials for ZooKeeper authentication and for Kafka authentication).

Yes, these are the exact configs that you’ll need to set.


Also a quick question, once I make these changes to use keytabs instead of ticket cache, Is there any place in the logs I can check, were the setting I made are in use and the applications are not actually using again ticket cache again?

You should be able to find logs such as “Adding keytab <keytab path> to the AM container …” at the beginning of the job submission.


Cheers,
Gordon

On 18 August 2017 at 5:51:57 AM, Raja.Aravapalli ([hidden email]) wrote:

 

Thanks a lot Eron…

 

If I am understanding you correct, you suggest using keytabs to launch streaming applications!

 

Can you please confirm if I have to use the below settings to ensure I use keytabs?

 

  • security.kerberos.login.use-ticket-cache:

Indicates whether to read from your Kerberos ticket cache (default: true).

 

  • security.kerberos.login.keytab:

Absolute path to a Kerberos keytab file that contains the user credentials.

 

  • security.kerberos.login.principal:

Kerberos principal name associated with the keytab.

 

  • security.kerberos.login.contexts: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, Client,KafkaClient to use the credentials for ZooKeeper authentication and for Kafka authentication).

 

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#kerberos-based-security-1

 

 

Also a quick question, once I make these changes to use keytabs instead of ticket cache, Is there any place in the logs I can check, were the setting I made are in use and the applications are not actually using again ticket cache again?

 

Thanks a lot, in advance.

 

 

Regards,

Raja.

 

From: Eron Wright <[hidden email]>
Date: Thursday, August 17, 2017 at 1:06 PM
To: Ted Yu <[hidden email]>
Cc: Raja Aravapalli <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Raja,

According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days.   You say that the job ran without issue for 'a few days'.  Can we conclude that the job hit the 7-day DT expiration?

 

Flink supports the use of Kerberos keytabs as an alternative to delegation tokens for exactly this reason, that delegation tokens eventually expire and so aren't useful to a long-running program.   Consider making use of keytabs here.

 

Hope this helps!

-Eron

 

 

On Thu, Aug 17, 2017 at 9:58 AM, Ted Yu <[hidden email]> wrote:

I think this needs to be done by the admin.

 

On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli <[hidden email]> wrote:

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.

 

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

Raja.Aravapalli

 

Thanks Gordon.

 

 

Regards,

Raja.

 

From: "Tzu-Li (Gordon) Tai" <[hidden email]>
Date: Thursday, August 17, 2017 at 11:47 PM
To: Raja Aravapalli <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Hi Raja,

 

Can you please confirm if I have to use the below settings to ensure I use keytabs?

 

  • security.kerberos.login.use-ticket-cache:

Indicates whether to read from your Kerberos ticket cache (default: true).

 

  • security.kerberos.login.keytab:

Absolute path to a Kerberos keytab file that contains the user credentials.

 

  • security.kerberos.login.principal:

Kerberos principal name associated with the keytab.

 

  • security.kerberos.login.contexts: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, Client,KafkaClient to use the credentials for ZooKeeper authentication and for Kafka authentication).

Yes, these are the exact configs that you’ll need to set.

 

Also a quick question, once I make these changes to use keytabs instead of ticket cache, Is there any place in the logs I can check, were the setting I made are in use and the applications are not actually using again ticket cache again?

You should be able to find logs such as “Adding keytab <keytab path> to the AM container …” at the beginning of the job submission.

 

Cheers,

Gordon

On 18 August 2017 at 5:51:57 AM, Raja.Aravapalli ([hidden email]) wrote:

 

Thanks a lot Eron…

 

If I am understanding you correct, you suggest using keytabs to launch streaming applications!

 

Can you please confirm if I have to use the below settings to ensure I use keytabs?

 

  • security.kerberos.login.use-ticket-cache:

Indicates whether to read from your Kerberos ticket cache (default: true).

 

  • security.kerberos.login.keytab:

Absolute path to a Kerberos keytab file that contains the user credentials.

 

  • security.kerberos.login.principal:

Kerberos principal name associated with the keytab.

 

  • security.kerberos.login.contexts: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, Client,KafkaClient to use the credentials for ZooKeeper authentication and for Kafka authentication).

 

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#kerberos-based-security-1

 

 

Also a quick question, once I make these changes to use keytabs instead of ticket cache, Is there any place in the logs I can check, were the setting I made are in use and the applications are not actually using again ticket cache again?

 

Thanks a lot, in advance.

 

 

Regards,

Raja.

 

From: Eron Wright <[hidden email]>
Date: Thursday, August 17, 2017 at 1:06 PM
To: Ted Yu <[hidden email]>
Cc: Raja Aravapalli <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Raja,

According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days.   You say that the job ran without issue for 'a few days'.  Can we conclude that the job hit the 7-day DT expiration?

 

Flink supports the use of Kerberos keytabs as an alternative to delegation tokens for exactly this reason, that delegation tokens eventually expire and so aren't useful to a long-running program.   Consider making use of keytabs here.

 

Hope this helps!

-Eron

 

 

On Thu, Aug 17, 2017 at 9:58 AM, Ted Yu <[hidden email]> wrote:

I think this needs to be done by the admin.

 

On Thu, Aug 17, 2017 at 9:37 AM, Raja.Aravapalli <[hidden email]> wrote:

 

I don’t have access to the site.xml files, it is controlled by a support team.

 

Does flink has any configuration settings or api’s thru which we can control this ?

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 11:07 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

Can you try shortening renewal interval to something like 28800000 ?

 

Cheers

 

On Thu, Aug 17, 2017 at 8:58 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Below is what I see in the environment:

 

dfs.namenode.delegation.token.max-lifetime:          604800000

dfs.namenode.delegation.token.renew-interval:      86400000

 

 

Thanks.

 

 

Regards,

Raja.

 

From: Ted Yu <[hidden email]>
Date: Thursday, August 17, 2017 at 10:46 AM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

 

What are the values for the following parameters ?

 

dfs.namenode.delegation.token.max-lifetime

 

dfs.namenode.delegation.token.renew-interval

 

Cheers

 

On Thu, Aug 17, 2017 at 8:24 AM, Raja.Aravapalli <[hidden email]> wrote:

Hi Ted,

 

Find below the configuration I see in yarn-site.xml

 

<property>

      <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>

      <value>true</value>

    </property>

 

 

Regards,

Raja.

 

 

From: Ted Yu <[hidden email]>
Date: Wednesday, August 16, 2017 at 9:05 PM
To: Raja Aravapalli <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: hadoop

 

Can you check the following config in yarn-site.xml ?

 

yarn.resourcemanager.proxy-user-privileges.enabled (true)

 

Cheers

 

On Wed, Aug 16, 2017 at 4:48 PM, Raja.Aravapalli <[hidden email]> wrote:

 

Hi,

 

I triggered an flink yarn-session on a running Hadoop cluster… and triggering streaming application on that.

 

But, I see after few days of running without any issues, the flink application which is writing data to hdfs failing with below exception.

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxxxx for xxxxxx) can't be found in cache

 

 

Can someone please help me how I can fix this. Thanks a lot.

 

 

 

Regards,

Raja.