StreamingFileSink to a S3 Bucket on a remote account using STS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

StreamingFileSink to a S3 Bucket on a remote account using STS

orionemail
Hi,

New to both AWS and Flink but currently have a need to write incoming data into a S3 bucket managed via AWS Tempory credentials.

I am unable to get this to work, but I am not entirely sure on the steps needed.  I can write to S3 buckets that are not 'remote' and managed by STS tempory credentials fine.

I am using flink 1.9.1, as this will when deployed live in EMR.  

My flink-conf.yml contains the following entries:

    fs.s3a.bucket.sky-rdk-telemetry.aws.credentials.provider: > org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
    fs.s3a.bucket.sky-rdk-telemetry.access-key: xxxxx
    fs.s3a.bucket.sky-rdk-telemetry.secret-key: xxxx
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.arn: xxxx
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.session.name: xxxx

And my POM contains

<dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-bom</artifactId>
                <version>1.11.700</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

   
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-sts</artifactId>
            <version>1.11.700</version>
        </dependency>


I have put the jar flink-s3-fs-hadoop-1.9.1.jar into the plugins directory.

Running my test Jar I am getting exceptions related to Class not found for org/apache/flink/fs/s3base/shaded/com/amazonaws/services/securitytoken/model/AWSSecurityTokenServiceException

and poking around I see this is shaded into a package in Kinesis.  I have added some rules to maven shade to rewrite the package as needed but this still doesn't help.

Am I heading in the correct direction?  Searching has not turned up much information that I have been able to make use of.

Thanks for your time,

J





Sent with ProtonMail Secure Email.

Reply | Threaded
Open this post in threaded view
|

Re: StreamingFileSink to a S3 Bucket on a remote account using STS

rmetzger0
Hey,
sorry for the late response. 

Can you provide the full exceptions(s) including stacktrace you are seeing?

On Mon, Apr 20, 2020 at 3:39 PM orionemail <[hidden email]> wrote:
Hi,

New to both AWS and Flink but currently have a need to write incoming data into a S3 bucket managed via AWS Tempory credentials.

I am unable to get this to work, but I am not entirely sure on the steps needed.  I can write to S3 buckets that are not 'remote' and managed by STS tempory credentials fine.

I am using flink 1.9.1, as this will when deployed live in EMR.  

My flink-conf.yml contains the following entries:

    fs.s3a.bucket.sky-rdk-telemetry.aws.credentials.provider: > org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
    fs.s3a.bucket.sky-rdk-telemetry.access-key: xxxxx
    fs.s3a.bucket.sky-rdk-telemetry.secret-key: xxxx
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.arn: xxxx

And my POM contains

<dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-bom</artifactId>
                <version>1.11.700</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

   
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-sts</artifactId>
            <version>1.11.700</version>
        </dependency>


I have put the jar flink-s3-fs-hadoop-1.9.1.jar into the plugins directory.

Running my test Jar I am getting exceptions related to Class not found for org/apache/flink/fs/s3base/shaded/com/amazonaws/services/securitytoken/model/AWSSecurityTokenServiceException

and poking around I see this is shaded into a package in Kinesis.  I have added some rules to maven shade to rewrite the package as needed but this still doesn't help.

Am I heading in the correct direction?  Searching has not turned up much information that I have been able to make use of.

Thanks for your time,

J





Sent with ProtonMail Secure Email.

Reply | Threaded
Open this post in threaded view
|

Fwd: StreamingFileSink to a S3 Bucket on a remote account using STS

Arvid Heise-3
In reply to this post by orionemail
Oh shoot, I replied privately. Let me forward the responses to the list including the solution.

---------- Forwarded message ---------
From: orionemail <[hidden email]>
Date: Thu, Apr 23, 2020 at 3:38 PM
Subject: Re: StreamingFileSink to a S3 Bucket on a remote account using STS
To: Arvid Heise <[hidden email]>


Hi,

Thanks for your assistance.  With your pointers I have been able to confirm that I have been able to write to a remote bucket managed with the tempory/STS credentials.

It did indeed take a lot of fiddling to get it working and I am not quite sure what I did is safe for production.  However I'll mention breifly what I did here incase it helps anyone else.

1. As suggested move the flink-s3-fs-hadoop-1.9.1.jar into the lib directory rather than inside plugins.
2. I then removed the dependancy for aws-java-sdk-sts from my pom.xml and removed shading and class relocations I originally tried.  This was not enough to get this to work so I decided it would be cleaner to modify the original aws-sdk pom.xml instead.
3. Downloaded the source for the java SDK for AWS and modified aws-sdk-core and aws-sdk-sts pom.xml files to apply the shading.

aws-sdk-sts/pom.xml

<relocations>
                                <relocation>
                                    <pattern>com.amazonaws</pattern>
                                    <shadedPattern>org.apache.flink.fs.s3base.shaded.com.amazonaws</shadedPattern>
                                    <includes>
                                        <include>com.amazonaws.**</include>
                                    </includes>
                                </relocation>
                            </relocations>

aws-sdk-core/pom.xml 

<relocations>
                                <relocation>
                                    <pattern>com</pattern>
                                    <shadedPattern>org.apache.flink.fs.s3base.shaded.com</shadedPattern>
                                    <!-- <includes> -->
                                    <!--     <include>com.amazonaws.**</include> -->
                                    <!-- </includes> -->
                                </relocation>
                                <relocation>
                                    <pattern>org</pattern>
                                    <shadedPattern>org.apache.flink.fs.shaded.hadoop3.org</shadedPattern>
                                    <!-- <includes> -->
                                    <!--     <include>com.amazonaws.**</include> -->
                                    <!-- </includes> -->
                                    <excludes>
                                      <exclude>org.apache.http.**</exclude>
                                    </excludes>
                                </relocation>

                                <relocation>j
                                  <pattern>org.apache.http</pattern>                               <shadedPattern>org.apache.flink.fs.s3base.shaded.org.apache.http</shadedPattern>
                                </relocation>                  
                            </relocations>

This was enough to get my test to work.  
Does this look sensible, or could I have done this in a simpler way?

Thanks again for your help.





‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, 22 April 2020 16:02, Arvid Heise <[hidden email]> wrote:

Hi,

seems like some classes have not been properly shipped in the plugin. Sorry about that!

Your approach in general looks good, but the plugin mechanism is causing you some trouble in this case. The issue is that in Flink 1.9, we still shaded packages as you have figured. However, whatever you bundle in your user code jar, is hidden from the plugin, because it uses a different classloader (the whole motivation of plugins).

So a quick fix for 1.9 would actually be to not use flink-s3-fs-hadoop as a plugin, but put it into lib or even bundle it in your user jar. Then your additionally bundled and relocated classes should be visible to s3.

For 1.10, we got rid of relocations in the s3 plugins (finally!). There it would be enough to put the aws-java-sdk-sts.jar inside the plugin folder (that's why you need to put your jar inside a folder in plugin as that represents a logical classpath).

Best,

Arvid

On Mon, Apr 20, 2020 at 3:39 PM orionemail <[hidden email]> wrote:
Hi,

New to both AWS and Flink but currently have a need to write incoming data into a S3 bucket managed via AWS Tempory credentials.

I am unable to get this to work, but I am not entirely sure on the steps needed.  I can write to S3 buckets that are not 'remote' and managed by STS tempory credentials fine.

I am using flink 1.9.1, as this will when deployed live in EMR.  

My flink-conf.yml contains the following entries:

    fs.s3a.bucket.sky-rdk-telemetry.aws.credentials.provider: > org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
    fs.s3a.bucket.sky-rdk-telemetry.access-key: xxxxx
    fs.s3a.bucket.sky-rdk-telemetry.secret-key: xxxx
    fs.s3a.bucket.sky-rdk-telemetry.assumed.role.arn: xxxx

And my POM contains

<dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-bom</artifactId>
                <version>1.11.700</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

   
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-sts</artifactId>
            <version>1.11.700</version>
        </dependency>


I have put the jar flink-s3-fs-hadoop-1.9.1.jar into the plugins directory.

Running my test Jar I am getting exceptions related to Class not found for org/apache/flink/fs/s3base/shaded/com/amazonaws/services/securitytoken/model/AWSSecurityTokenServiceException

and poking around I see this is shaded into a package in Kinesis.  I have added some rules to maven shade to rewrite the package as needed but this still doesn't help.

Am I heading in the correct direction?  Searching has not turned up much information that I have been able to make use of.

Thanks for your time,

J





Sent with ProtonMail Secure Email.



--

Arvid Heise | Senior Java Developer



Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng