Hi Max,
I will try these workaround.
Thanks
Thomas
________________________________________
De : Maximilian Michels [[hidden email]]
Envoyé : mardi 15 mars 2016 16:51
À : [hidden email]
Cc : Niels Basjes
Objet : Re: Flink job on secure Yarn fails after many hours
Hi Thomas,
Nils (CC) and I found out that you need at least Hadoop version 2.6.1
to properly run Kerberos applications on Hadoop clusters. Versions
before that have critical bugs related to the internal security token
handling that may expire the token although it is still valid.
That said, there is another limitation of Hadoop that the maximum
internal token life time is one week. To work around this limit, you
have two options:
a) increasing the maximum token life time
In yarn-site.xml:
<property>
<name>yarn.resourcemanager.delegation.token.max-lifetime</name>
<value>9223372036854775807</value>
</property>
In hdfs-site.xml
<property>
<name>dfs.namenode.delegation.token.max-lifetime</name>
<value>9223372036854775807</value>
</property>
b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode:
From http://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_yarn_long_jobs.html
"You can work around this by configuring the ResourceManager as a
proxy user for the corresponding HDFS NameNode so that the
ResourceManager can request new tokens when the existing ones are past
their maximum lifetime."
@Nils: Could you comment on what worked best for you?
Best,
Max
On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault
<[hidden email]> wrote:
>
> Hello everyone,
>
>
>
> We are facing the same probleme now in our Flink applications, launch using YARN.
>
> Just want to know if there is any update about this exception ?
>
>
>
> Thanks
>
>
>
> Thomas
>
>
>
> ________________________________
>
> De : [hidden email] [[hidden email]] de la part de Niels Basjes [[hidden email]]
> Envoyé : vendredi 4 décembre 2015 10:40
> À : [hidden email]
> Objet : Re: Flink job on secure Yarn fails after many hours
>
> Hi Maximilian,
>
> I just downloaded the version from your google drive and used that to run my test topology that accesses HBase.
> I deliberately started it twice to double the chance to run into this situation.
>
> I'll keep you posted.
>
> Niels
>
>
> On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <[hidden email]> wrote:
>>
>> Hi Niels,
>>
>> Just got back from our CI. The build above would fail with a
>> Checkstyle error. I corrected that. Also I have built the binaries for
>> your Hadoop version 2.6.0.
>>
>> Binaries:
>>
>> https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip
>>
>> Thanks,
>> Max
>>
>> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281
>> >>>> >> >> > 21:30:28,185 ERROR org.apache.flink.runtime.jobmanager.JobManager
>> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 terminated,
>> >>>> >> >> > stopping
>> >>>> >> >> > process...
>> >>>> >> >> > 21:30:28,286 INFO
>> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>> >>>> >> >> > - Removing web root dir
>> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > --
>> >>>> >> >> > Best regards / Met vriendelijke groeten,
>> >>>> >> >> >
>> >>>> >> >> > Niels Basjes
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > --
>> >>>> >> > Best regards / Met vriendelijke groeten,
>> >>>> >> >
>> >>>> >> > Niels Basjes
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best regards / Met vriendelijke groeten,
>> >>>> >
>> >>>> > Niels Basjes
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards / Met vriendelijke groeten,
>> >>>
>> >>> Niels Basjes
>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
Free forum by Nabble | Edit this page |