Impersonation support in Flink

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Impersonation support in Flink

Chan, Regina

Hi folks,

 

Is Flink is able to do impersonation using UserGroupInformation? How do we make all the tasks run with this in a way that we wouldn’t have to do it per task?

 

 

UserGroupInformation ugi = UserGroupInformation.createProxyUser( proxyUser, UserGroupInformation.getLoginUser());          

PrivilegedExceptionAction<Void> iAction = new PrivilegedExceptionAction<Void>()

{

public Void run() throws Exception

{

              action.run();

              return null;

       }

};

ugi.doAs(iAction);

 

 

 

Regina Chan

Goldman Sachs Enterprise Platforms, Data Architecture

30 Hudson Street, 37th floor | Jersey City, NY 07302 (  (212) 902-5697

 

Reply | Threaded
Open this post in threaded view
|

Re: Impersonation support in Flink

Eron Wright
Hello,
Flink does initialize the process-wide login user, using the UGI's Kerberos login method.  It doesn't support proxy user at the moment.   Let's dig into the scenario a bit to see how best to support it.

As you know, the proxy user functionality of Hadoop allows a process that has superuser credentials to impersonate a normal user when making remote calls to HDFS and other remote services.    A possible scenario would be, the Flink cluster has a superuser account and accesses HDFS on behalf of someone.   Keep in mind that job code runs with full trust within the JM/TM, and would have access to the superuser keytab.   Does that sound like your scenario?   

Proxy user support would not facilitate the scenario of running a user's job code such that the job accesses HDFS as that user.   The only way to support that scenario is by launching the cluster using that user's keytab.

I hope this helps,
Eron

On Mon, Oct 23, 2017 at 10:52 AM, Chan, Regina <[hidden email]> wrote:

Hi folks,

 

Is Flink is able to do impersonation using UserGroupInformation? How do we make all the tasks run with this in a way that we wouldn’t have to do it per task?

 

 

UserGroupInformation ugi = UserGroupInformation.createProxyUser( proxyUser, UserGroupInformation.getLoginUser());          

PrivilegedExceptionAction<Void> iAction = new PrivilegedExceptionAction<Void>()

{

public Void run() throws Exception

{

              action.run();

              return null;

       }

};

ugi.doAs(iAction);

 

 

 

Regina Chan

Goldman Sachs Enterprise Platforms, Data Architecture

30 Hudson Street, 37th floor | Jersey City, NY 07302 (  <a href="tel:(212)%20902-5697" value="+12129025697" target="_blank">(212) 902-5697

 


Reply | Threaded
Open this post in threaded view
|

RE: Impersonation support in Flink

Newport, Billy

Our scenario is to enable a specific Kerberos to impersonate any Kerberos in a specific group, this is enabled the in hdfs configuration. That Kerberos does not need to be root, just a Kerberos allowed to impersonate that users in that group.

 

We want the job to access HDFS as the impersonated Kerberos, not the one that launched it. We do this with our MR jobs but simply impersonating in the driver and all the mappers/reduces run correctly and use the impersonate user active when the job was submitted. We expected flink to work similarly and found the issue.

 

We do this without the keytab for that user, if we had it, we wouldn’t need to impersonate if you see what I mean.

 

So, what kind of changes would be needed where to implement this function, happy to do the patch to enable this behavior.

 

Billy

 

 

From: Eron Wright [mailto:[hidden email]]
Sent: Monday, October 23, 2017 4:53 PM
To: Chan, Regina [Tech]
Cc: [hidden email]
Subject: Re: Impersonation support in Flink

 

Hello,

Flink does initialize the process-wide login user, using the UGI's Kerberos login method.  It doesn't support proxy user at the moment.   Let's dig into the scenario a bit to see how best to support it.

 

As you know, the proxy user functionality of Hadoop allows a process that has superuser credentials to impersonate a normal user when making remote calls to HDFS and other remote services.    A possible scenario would be, the Flink cluster has a superuser account and accesses HDFS on behalf of someone.   Keep in mind that job code runs with full trust within the JM/TM, and would have access to the superuser keytab.   Does that sound like your scenario?   

 

Proxy user support would not facilitate the scenario of running a user's job code such that the job accesses HDFS as that user.   The only way to support that scenario is by launching the cluster using that user's keytab.

 

I hope this helps,

Eron

 

On Mon, Oct 23, 2017 at 10:52 AM, Chan, Regina <[hidden email]> wrote:

Hi folks,

 

Is Flink is able to do impersonation using UserGroupInformation? How do we make all the tasks run with this in a way that we wouldn’t have to do it per task?

 

 

UserGroupInformation ugi = UserGroupInformation.createProxyUser( proxyUser, UserGroupInformation.getLoginUser());          

PrivilegedExceptionAction<Void> iAction = new PrivilegedExceptionAction<Void>()

{

public Void run() throws Exception

{

              action.run();

              return null;

       }

};

ugi.doAs(iAction);

 

 

 

Regina Chan

Goldman Sachs Enterprise Platforms, Data Architecture

30 Hudson Street, 37th floor | Jersey City, NY 07302 (  <a href="tel:(212)%20902-5697" target="_blank">(212) 902-5697