Kerberos on YARN: delegation or proxying?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Kerberos on YARN: delegation or proxying?

stefanobaghino
Hello everybody,

I'm running some tests on how Flink as a long-running YARN session handles security with Kerberos. In particular, I'm running a test where I run Flink on YARN with a service account and then deploy a job via CLI as another user; in the job I'm trying to access a private folder of the former on HDFS but the job fails due to permission issues (the user running the job is actually the one who ran Flink on YARN in the first place — the service account).

I'm running Flink 1.0.0-RC5, launching the long-running session with:

bin/yarn-session.sh -n 2 -tm 4096 -s 3

and then running the following command:

bin/flink run examples/batch/WordCount.jar \
--input hdfs:///user/stefano.baghino/hamlet.txt \
--output hdfs:///user/stefano.baghino/hamlet.out


It looks like the YARN session is acting as a proxy for the user instead of receiving a delegation. Is there a way to change this behavior? Is this by design? Is there an interest in implementing the delegation (if it's not already implemented)? Otherwise, is there a workaround, apart from running one-off jobs on YARN?

Thank you so much in advance.

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: Kerberos on YARN: delegation or proxying?

stefanobaghino
In the initial description, I meant "I'm trying to access a private folder of the latter", so not the service account. Sorry for the mistake.

On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino <[hidden email]> wrote:
Hello everybody,

I'm running some tests on how Flink as a long-running YARN session handles security with Kerberos. In particular, I'm running a test where I run Flink on YARN with a service account and then deploy a job via CLI as another user; in the job I'm trying to access a private folder of the former on HDFS but the job fails due to permission issues (the user running the job is actually the one who ran Flink on YARN in the first place — the service account).

I'm running Flink 1.0.0-RC5, launching the long-running session with:

bin/yarn-session.sh -n 2 -tm 4096 -s 3

and then running the following command:

bin/flink run examples/batch/WordCount.jar \
--input hdfs:///user/stefano.baghino/hamlet.txt \
--output hdfs:///user/stefano.baghino/hamlet.out


It looks like the YARN session is acting as a proxy for the user instead of receiving a delegation. Is there a way to change this behavior? Is this by design? Is there an interest in implementing the delegation (if it's not already implemented)? Otherwise, is there a workaround, apart from running one-off jobs on YARN?

Thank you so much in advance.

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: Kerberos on YARN: delegation or proxying?

stefanobaghino
One last note: initially I tried to run the session as the same OS user, running kdestroy and then kinit with the other user, having this error. Trying to run the job in a different OS session, authenticating with Kerberos as the user who should run the job, I can't connect to the JobManager. I've added a second log with this error to the gist.

On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino <[hidden email]> wrote:
In the initial description, I meant "I'm trying to access a private folder of the latter", so not the service account. Sorry for the mistake.

On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino <[hidden email]> wrote:
Hello everybody,

I'm running some tests on how Flink as a long-running YARN session handles security with Kerberos. In particular, I'm running a test where I run Flink on YARN with a service account and then deploy a job via CLI as another user; in the job I'm trying to access a private folder of the former on HDFS but the job fails due to permission issues (the user running the job is actually the one who ran Flink on YARN in the first place — the service account).

I'm running Flink 1.0.0-RC5, launching the long-running session with:

bin/yarn-session.sh -n 2 -tm 4096 -s 3

and then running the following command:

bin/flink run examples/batch/WordCount.jar \
--input hdfs:///user/stefano.baghino/hamlet.txt \
--output hdfs:///user/stefano.baghino/hamlet.out


It looks like the YARN session is acting as a proxy for the user instead of receiving a delegation. Is there a way to change this behavior? Is this by design? Is there an interest in implementing the delegation (if it's not already implemented)? Otherwise, is there a workaround, apart from running one-off jobs on YARN?

Thank you so much in advance.

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: Kerberos on YARN: delegation or proxying?

Maximilian Michels
Hi Stefano,

That is currently a limitation of the Kerberos implementation. The
Kerberos authentication is performed only once the Flink cluster is
brought up. The Yarn session is then tight to a particular user's
ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
run long-running jobs because there is a bug in the Kerberos client
that may let the ticket expire.

The workaround you already mentioned is to use a per-job Yarn cluster.
There is currently no plan to delegate the user token per job but we
could certainly think about implementing this in the future.

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos

Cheers,
Max

On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
<[hidden email]> wrote:

> One last note: initially I tried to run the session as the same OS user,
> running kdestroy and then kinit with the other user, having this error.
> Trying to run the job in a different OS session, authenticating with
> Kerberos as the user who should run the job, I can't connect to the
> JobManager. I've added a second log with this error to the gist.
>
> On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> <[hidden email]> wrote:
>>
>> In the initial description, I meant "I'm trying to access a private folder
>> of the latter", so not the service account. Sorry for the mistake.
>>
>> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
>> <[hidden email]> wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm running some tests on how Flink as a long-running YARN session
>>> handles security with Kerberos. In particular, I'm running a test where I
>>> run Flink on YARN with a service account and then deploy a job via CLI as
>>> another user; in the job I'm trying to access a private folder of the former
>>> on HDFS but the job fails due to permission issues (the user running the job
>>> is actually the one who ran Flink on YARN in the first place — the service
>>> account).
>>>
>>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>>
>>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>>
>>> and then running the following command:
>>>
>>> bin/flink run examples/batch/WordCount.jar \
>>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>>> --output hdfs:///user/stefano.baghino/hamlet.out
>>>
>>> Here are the logs:
>>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>>
>>> It looks like the YARN session is acting as a proxy for the user instead
>>> of receiving a delegation. Is there a way to change this behavior? Is this
>>> by design? Is there an interest in implementing the delegation (if it's not
>>> already implemented)? Otherwise, is there a workaround, apart from running
>>> one-off jobs on YARN?
>>>
>>> Thank you so much in advance.
>>>
>>> --
>>> BR,
>>> Stefano Baghino
>>>
>>> Software Engineer @ Radicalbit
>>
>>
>>
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: Kerberos on YARN: delegation or proxying?

stefanobaghino
Ok, thank you for the very detailed explanation!

On Sun, Mar 6, 2016 at 10:02 PM, Maximilian Michels <[hidden email]> wrote:
Hi Stefano,

That is currently a limitation of the Kerberos implementation. The
Kerberos authentication is performed only once the Flink cluster is
brought up. The Yarn session is then tight to a particular user's
ticket. Note, that you need at least Hadoop version 2.6.1 or higher to
run long-running jobs because there is a bug in the Kerberos client
that may let the ticket expire.

The workaround you already mentioned is to use a per-job Yarn cluster.
There is currently no plan to delegate the user token per job but we
could certainly think about implementing this in the future.

https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#kerberos

Cheers,
Max

On Sun, Mar 6, 2016 at 9:27 PM, Stefano Baghino
<[hidden email]> wrote:
> One last note: initially I tried to run the session as the same OS user,
> running kdestroy and then kinit with the other user, having this error.
> Trying to run the job in a different OS session, authenticating with
> Kerberos as the user who should run the job, I can't connect to the
> JobManager. I've added a second log with this error to the gist.
>
> On Sun, Mar 6, 2016 at 9:01 PM, Stefano Baghino
> <[hidden email]> wrote:
>>
>> In the initial description, I meant "I'm trying to access a private folder
>> of the latter", so not the service account. Sorry for the mistake.
>>
>> On Sun, Mar 6, 2016 at 8:54 PM, Stefano Baghino
>> <[hidden email]> wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm running some tests on how Flink as a long-running YARN session
>>> handles security with Kerberos. In particular, I'm running a test where I
>>> run Flink on YARN with a service account and then deploy a job via CLI as
>>> another user; in the job I'm trying to access a private folder of the former
>>> on HDFS but the job fails due to permission issues (the user running the job
>>> is actually the one who ran Flink on YARN in the first place — the service
>>> account).
>>>
>>> I'm running Flink 1.0.0-RC5, launching the long-running session with:
>>>
>>> bin/yarn-session.sh -n 2 -tm 4096 -s 3
>>>
>>> and then running the following command:
>>>
>>> bin/flink run examples/batch/WordCount.jar \
>>> --input hdfs:///user/stefano.baghino/hamlet.txt \
>>> --output hdfs:///user/stefano.baghino/hamlet.out
>>>
>>> Here are the logs:
>>> https://gist.github.com/stefanobaghino/6605ec33a1c4b632fb78
>>>
>>> It looks like the YARN session is acting as a proxy for the user instead
>>> of receiving a delegation. Is there a way to change this behavior? Is this
>>> by design? Is there an interest in implementing the delegation (if it's not
>>> already implemented)? Otherwise, is there a workaround, apart from running
>>> one-off jobs on YARN?
>>>
>>> Thank you so much in advance.
>>>
>>> --
>>> BR,
>>> Stefano Baghino
>>>
>>> Software Engineer @ Radicalbit
>>
>>
>>
>>
>> --
>> BR,
>> Stefano Baghino
>>
>> Software Engineer @ Radicalbit
>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit