Flink - Pod Identity

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink - Pod Identity

Swagat Mishra
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Israel Ekpo
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Swagat Mishra
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat

On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Swagat Mishra
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Sameer Wadkar
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Swagat Mishra
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
Hi Swagat,

It looks like Flink 1.6 bundles the 1.11.165 version of the aws-java-sdk-core with the Presto implementation (transitively from Presto 0.185[1]).
The minimum support version for the ServiceAccount authentication approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], long after Flink 1.6 was released. It looks like even the most recent Presto is on a version below that, concretely 1.11.697 in the master branch[4], so I don't think even upgrading Flink to 1.6+ will solve this though it looks to me like the AWS dependency is managed better in more recent Flink versions. I'll have more for you on that front tomorrow, after the Easter break.

I think what you would have to do to make this authentication approach work for Flink 1.6 is building a custom version of the flink-s3-fs-presto jar, replacing the bundled AWS dependency with the 1.11.704 version, and then shading it the same way.

In the meantime, would you mind creating a JIRA ticket with this use case? That'll give you the best insight into the status of fixing this :)

Let me know if that makes sense,
Austin


On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <[hidden email]> wrote:
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Swagat Mishra
Hi Austin,

Thanks for your reply.

Atm, I have upgraded to 1.12 version of Flink, but I still see the same issue. I have taken a look at presto as well. I am looking to experiment with the settings like S3_KMS_KEY_ID (provided in the link below). If this doesn't work, I Will look to modify the Presto code to have a custom version that supports pod identity through a service account. 

Yes, I Can create a JIRA ticket for you.


Regards,
Swagat

On Mon, Apr 5, 2021 at 10:39 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

It looks like Flink 1.6 bundles the 1.11.165 version of the aws-java-sdk-core with the Presto implementation (transitively from Presto 0.185[1]).
The minimum support version for the ServiceAccount authentication approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], long after Flink 1.6 was released. It looks like even the most recent Presto is on a version below that, concretely 1.11.697 in the master branch[4], so I don't think even upgrading Flink to 1.6+ will solve this though it looks to me like the AWS dependency is managed better in more recent Flink versions. I'll have more for you on that front tomorrow, after the Easter break.

I think what you would have to do to make this authentication approach work for Flink 1.6 is building a custom version of the flink-s3-fs-presto jar, replacing the bundled AWS dependency with the 1.11.704 version, and then shading it the same way.

In the meantime, would you mind creating a JIRA ticket with this use case? That'll give you the best insight into the status of fixing this :)

Let me know if that makes sense,
Austin


On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <[hidden email]> wrote:
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
In reply to this post by austin.ce
That looks interesting! I've also found the full list of S3 properties[1] for the version of presto-hive bundled with Flink 1.12 (see [2]), which includes an option for a KMS key (hive.s3.kms-key-id).

(also, adding back the user list)


On Mon, Apr 5, 2021 at 4:21 PM Swagat Mishra <[hidden email]> wrote:
Btw, there is also an option to provide a custom credential provider, what are your thoughts on this?

presto.s3.credentials-provider

On Tue, Apr 6, 2021 at 12:43 AM Austin Cawley-Edwards <[hidden email]> wrote:
I've confirmed that for the bundled + shaded aws dependency, the only way to upgrade it is to build a flink-s3-fs-presto jar with the updated dependency. Let me know if this is feasible for you, if the KMS key solution doesn't work.

Best,
Austin

On Mon, Apr 5, 2021 at 2:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I don't believe there is an explicit configuration option for the KMS key – please let me know if you're able to make that work!

Best,
Austin

On Mon, Apr 5, 2021 at 1:45 PM Swagat Mishra <[hidden email]> wrote:
Hi Austin,

Let me know what you think on my latest email, if the approach might work, or if it is already supported and I am not using the configurations properly. 

Thanks for your interest and support.

Regards,
Swagat

On Mon, Apr 5, 2021 at 10:39 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

It looks like Flink 1.6 bundles the 1.11.165 version of the aws-java-sdk-core with the Presto implementation (transitively from Presto 0.185[1]).
The minimum support version for the ServiceAccount authentication approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], long after Flink 1.6 was released. It looks like even the most recent Presto is on a version below that, concretely 1.11.697 in the master branch[4], so I don't think even upgrading Flink to 1.6+ will solve this though it looks to me like the AWS dependency is managed better in more recent Flink versions. I'll have more for you on that front tomorrow, after the Easter break.

I think what you would have to do to make this authentication approach work for Flink 1.6 is building a custom version of the flink-s3-fs-presto jar, replacing the bundled AWS dependency with the 1.11.704 version, and then shading it the same way.

In the meantime, would you mind creating a JIRA ticket with this use case? That'll give you the best insight into the status of fixing this :)

Let me know if that makes sense,
Austin


On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <[hidden email]> wrote:
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
And actually, I've found that the correct version of the AWS SDK is included in Flink 1.12, which was reported and fixed in FLINK-18676 (see[1]). Since you said you saw this also occur in 1.12, can you share more details about what you saw there?

Best,
Austin


On Mon, Apr 5, 2021 at 4:53 PM Austin Cawley-Edwards <[hidden email]> wrote:
That looks interesting! I've also found the full list of S3 properties[1] for the version of presto-hive bundled with Flink 1.12 (see [2]), which includes an option for a KMS key (hive.s3.kms-key-id).

(also, adding back the user list)


On Mon, Apr 5, 2021 at 4:21 PM Swagat Mishra <[hidden email]> wrote:
Btw, there is also an option to provide a custom credential provider, what are your thoughts on this?

presto.s3.credentials-provider

On Tue, Apr 6, 2021 at 12:43 AM Austin Cawley-Edwards <[hidden email]> wrote:
I've confirmed that for the bundled + shaded aws dependency, the only way to upgrade it is to build a flink-s3-fs-presto jar with the updated dependency. Let me know if this is feasible for you, if the KMS key solution doesn't work.

Best,
Austin

On Mon, Apr 5, 2021 at 2:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I don't believe there is an explicit configuration option for the KMS key – please let me know if you're able to make that work!

Best,
Austin

On Mon, Apr 5, 2021 at 1:45 PM Swagat Mishra <[hidden email]> wrote:
Hi Austin,

Let me know what you think on my latest email, if the approach might work, or if it is already supported and I am not using the configurations properly. 

Thanks for your interest and support.

Regards,
Swagat

On Mon, Apr 5, 2021 at 10:39 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

It looks like Flink 1.6 bundles the 1.11.165 version of the aws-java-sdk-core with the Presto implementation (transitively from Presto 0.185[1]).
The minimum support version for the ServiceAccount authentication approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], long after Flink 1.6 was released. It looks like even the most recent Presto is on a version below that, concretely 1.11.697 in the master branch[4], so I don't think even upgrading Flink to 1.6+ will solve this though it looks to me like the AWS dependency is managed better in more recent Flink versions. I'll have more for you on that front tomorrow, after the Easter break.

I think what you would have to do to make this authentication approach work for Flink 1.6 is building a custom version of the flink-s3-fs-presto jar, replacing the bundled AWS dependency with the 1.11.704 version, and then shading it the same way.

In the meantime, would you mind creating a JIRA ticket with this use case? That'll give you the best insight into the status of fixing this :)

Let me know if that makes sense,
Austin


On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <[hidden email]> wrote:
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

Swagat Mishra
I was able to solve the issue by providing a custom version of the presto jar. I will create a ticket and raise a pull request so that others can benefit from it. I will share the details here shortly.

Thanks everyone for your help and support. Especially Austin, he stands out due to his interest in the issue and helping to find ways to resolve it.

Regards,
Swagat

On Tue, Apr 6, 2021 at 2:35 AM Austin Cawley-Edwards <[hidden email]> wrote:
And actually, I've found that the correct version of the AWS SDK is included in Flink 1.12, which was reported and fixed in FLINK-18676 (see[1]). Since you said you saw this also occur in 1.12, can you share more details about what you saw there?

Best,
Austin


On Mon, Apr 5, 2021 at 4:53 PM Austin Cawley-Edwards <[hidden email]> wrote:
That looks interesting! I've also found the full list of S3 properties[1] for the version of presto-hive bundled with Flink 1.12 (see [2]), which includes an option for a KMS key (hive.s3.kms-key-id).

(also, adding back the user list)


On Mon, Apr 5, 2021 at 4:21 PM Swagat Mishra <[hidden email]> wrote:
Btw, there is also an option to provide a custom credential provider, what are your thoughts on this?

presto.s3.credentials-provider

On Tue, Apr 6, 2021 at 12:43 AM Austin Cawley-Edwards <[hidden email]> wrote:
I've confirmed that for the bundled + shaded aws dependency, the only way to upgrade it is to build a flink-s3-fs-presto jar with the updated dependency. Let me know if this is feasible for you, if the KMS key solution doesn't work.

Best,
Austin

On Mon, Apr 5, 2021 at 2:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I don't believe there is an explicit configuration option for the KMS key – please let me know if you're able to make that work!

Best,
Austin

On Mon, Apr 5, 2021 at 1:45 PM Swagat Mishra <[hidden email]> wrote:
Hi Austin,

Let me know what you think on my latest email, if the approach might work, or if it is already supported and I am not using the configurations properly. 

Thanks for your interest and support.

Regards,
Swagat

On Mon, Apr 5, 2021 at 10:39 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

It looks like Flink 1.6 bundles the 1.11.165 version of the aws-java-sdk-core with the Presto implementation (transitively from Presto 0.185[1]).
The minimum support version for the ServiceAccount authentication approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], long after Flink 1.6 was released. It looks like even the most recent Presto is on a version below that, concretely 1.11.697 in the master branch[4], so I don't think even upgrading Flink to 1.6+ will solve this though it looks to me like the AWS dependency is managed better in more recent Flink versions. I'll have more for you on that front tomorrow, after the Easter break.

I think what you would have to do to make this authentication approach work for Flink 1.6 is building a custom version of the flink-s3-fs-presto jar, replacing the bundled AWS dependency with the 1.11.704 version, and then shading it the same way.

In the meantime, would you mind creating a JIRA ticket with this use case? That'll give you the best insight into the status of fixing this :)

Let me know if that makes sense,
Austin


On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <[hidden email]> wrote:
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat


Reply | Threaded
Open this post in threaded view
|

Re: Flink - Pod Identity

austin.ce
Great, glad to hear it Swagat!

Did you end up using Flink 1.6 or were you able to upgrade to Flink 1.12? Could you also link the ticket back here if you've already made it/ make sure it is not a duplicate of FLINK-18676?

Best,
Austin

On Tue, Apr 6, 2021 at 12:29 PM Swagat Mishra <[hidden email]> wrote:
I was able to solve the issue by providing a custom version of the presto jar. I will create a ticket and raise a pull request so that others can benefit from it. I will share the details here shortly.

Thanks everyone for your help and support. Especially Austin, he stands out due to his interest in the issue and helping to find ways to resolve it.

Regards,
Swagat

On Tue, Apr 6, 2021 at 2:35 AM Austin Cawley-Edwards <[hidden email]> wrote:
And actually, I've found that the correct version of the AWS SDK is included in Flink 1.12, which was reported and fixed in FLINK-18676 (see[1]). Since you said you saw this also occur in 1.12, can you share more details about what you saw there?

Best,
Austin


On Mon, Apr 5, 2021 at 4:53 PM Austin Cawley-Edwards <[hidden email]> wrote:
That looks interesting! I've also found the full list of S3 properties[1] for the version of presto-hive bundled with Flink 1.12 (see [2]), which includes an option for a KMS key (hive.s3.kms-key-id).

(also, adding back the user list)


On Mon, Apr 5, 2021 at 4:21 PM Swagat Mishra <[hidden email]> wrote:
Btw, there is also an option to provide a custom credential provider, what are your thoughts on this?

presto.s3.credentials-provider

On Tue, Apr 6, 2021 at 12:43 AM Austin Cawley-Edwards <[hidden email]> wrote:
I've confirmed that for the bundled + shaded aws dependency, the only way to upgrade it is to build a flink-s3-fs-presto jar with the updated dependency. Let me know if this is feasible for you, if the KMS key solution doesn't work.

Best,
Austin

On Mon, Apr 5, 2021 at 2:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I don't believe there is an explicit configuration option for the KMS key – please let me know if you're able to make that work!

Best,
Austin

On Mon, Apr 5, 2021 at 1:45 PM Swagat Mishra <[hidden email]> wrote:
Hi Austin,

Let me know what you think on my latest email, if the approach might work, or if it is already supported and I am not using the configurations properly. 

Thanks for your interest and support.

Regards,
Swagat

On Mon, Apr 5, 2021 at 10:39 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

It looks like Flink 1.6 bundles the 1.11.165 version of the aws-java-sdk-core with the Presto implementation (transitively from Presto 0.185[1]).
The minimum support version for the ServiceAccount authentication approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], long after Flink 1.6 was released. It looks like even the most recent Presto is on a version below that, concretely 1.11.697 in the master branch[4], so I don't think even upgrading Flink to 1.6+ will solve this though it looks to me like the AWS dependency is managed better in more recent Flink versions. I'll have more for you on that front tomorrow, after the Easter break.

I think what you would have to do to make this authentication approach work for Flink 1.6 is building a custom version of the flink-s3-fs-presto jar, replacing the bundled AWS dependency with the 1.11.704 version, and then shading it the same way.

In the meantime, would you mind creating a JIRA ticket with this use case? That'll give you the best insight into the status of fixing this :)

Let me know if that makes sense,
Austin


On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <[hidden email]> wrote:
Austin - 

In my case the set up is such that services are deployed on Kubernetes with Docker, running on EKS. There is also an istio service mesh. So all the services communicate and access AWS resources like S3 using the service account. Service account is associated with IAM roles. I have verified that the service account has access to S3, by running a program that connects to S3 to read a file also aws client when packaged into the pod is able to access S3. So that means the roles and policies are good.

When I am running flink, I am following the same configuration for job manager and task manager as provided here:


The exception we are getting is - org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException: Unable to load credentials from service end point. 

This happens in the EC2CredentialFetcher class method fetchCredentials - line number 66, when it tries to read resource, effectively executing

I am not setting the variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the right way to do it for us, we are on EKS. Similarly any of the ~/.aws/credentials file approach will also not work for us.

 
Atm, I haven't tried the kuberenetes service account property you mentioned above. I will try and let you know how it goes. 

Question - do i need to provide any parameters while building the docker image or any configuration in the flink config to tell flink that for all purposes it should be using the service account and not try to get into the EC2CredentialFetcher class.

One more thing - we were trying this on the 1.6 version of Flink and not the 1.12 version.

Regards,
Swagat

On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <[hidden email]> wrote:
Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata to a daemonset which runs privileged pods which maps a IP Address of the pods and its associated service account to make STS calls and return temporary AWS credentials. Your pod “thinks” the ec2 metadata url works locally like in an ec2 instance. 

I have found that mutating webhooks are easier to deploy (when you have no control over the Kubernetes environment - say you cannot change iptables or run privileged pods). These can configure the ~/.aws/credentials file. The webhook can make the STS call for the service account to role mapping. A side car container to which the main container has no access can even renew credentials becoz STS returns temp credentials. 

Sent from my iPhone

On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <[hidden email]> wrote:


If you’re just looking to attach a service account to a pod using the native AWS EKS IAM mapping[1], you should be able to attach the service account to the pod via the `kubernetes.service-account` configuration option[2]. 

Let me know if that works for you!

Best,
Austin 


On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <[hidden email]> wrote:
Can you describe your setup a little bit more? And perhaps how you use this setup to grant access to other non-Flink pods?

On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <[hidden email]> wrote:
Yes I looked at kube2iam, I haven't experimented with it.

Given that the service account has access to S3, shouldn't we have a simpler mechanism to connect to underlying resources based on the service account authorization?

On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <[hidden email]> wrote:
Hi Swagat,

I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you?

Best,
Austin


On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <[hidden email]> wrote:
No we are running on aws. The mechanisms supported by flink to connect to resources like S3, need us to make changes that will impact all services, something that we don't want to do. So providing the aws secret key ID and passcode upfront or iam rules where it connects by executing curl/ http calls to connect to S3 , don't work for me.

I want to be able to connect to S3, using aws Api's and if that connection can be leveraged by the presto library, that is what I am looking for.

Regards,
Swagat


On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <[hidden email]> wrote:
Are you running on Azure Kubernetes Service.

You should be able to do it because the identity can be mapped to the labels of the pods not necessary Flink.

On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <[hidden email]> wrote:
Hi,

I think flink doesn't support pod identity, any plans tk achieve it in any subsequent release.

Regards,
Swagat