S3 Access in eu-central-1

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

S3 Access in eu-central-1

Dominik Bruhn
Hey everyone,
I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7) to
snapshot/checkpoint to an S3 bucket which is hosted in the eu-central-1
region. Everything works fine for other regions. I'm running my job on a
JobTracker in local mode. I googled the internet and found several
hints, most of them telling that setting the `fs.s3a.endpoint` should
solve it. It doesn't. I'm also sure that the core-site.xml (see below)
is picked up, if I put garbage into the endpoint then I receive a
hostname not found error.

The exception I'm getting is:
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS
Service: Amazon S3, AWS Request ID: 432415098B0994BC, AWS Error Code:
null, AWS Error Message: Bad Request, S3 Extended Request ID:
1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=

I read the AWS FAQ but I don't think that
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request 
applies to me as I'm not running the NativeFileSystem.

I suspect this is related to the v4 signing protocol which is required
for S3 in Frankfurt. Could it be that the aws-sdk version is just too
old? I tried to play around with it but the hadoop adapter is
incompatible with newer versions.

I have the following core-site.xml:

<?xml version="1.0"?>
<configuration>
   
<property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
   <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
   
<property><name>fs.s3a.access.key</name><value>something</value></property>
   
<property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
   
<property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com</value></property>
</configuration

Here is my lib folder with the versions of the aws-sdk and the
hadoop-aws integration:
-rw-------    1 root     root       11.4M Mar 20  2014
aws-java-sdk-1.7.4.jar
-rw-r--r--    1 1005     1006       70.0M Aug  3 12:10
flink-dist_2.11-1.3.2.jar
-rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07
flink-python_2.11-1.3.2.jar
-rw-r--r--    1 1005     1006       34.9M Aug  3 11:58
flink-shaded-hadoop2-uber-1.3.2.jar
-rw-------    1 root     root      100.7K Jan 14  2016
hadoop-aws-2.7.2.jar
-rw-------    1 root     root      414.7K May 17  2012
httpclient-4.2.jar
-rw-------    1 root     root      218.0K May  1  2012 httpcore-4.2.jar
-rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50 log4j-1.2.17.jar
-rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50
slf4j-log4j12-1.7.7.jar

Can anyone give me any hints?

Thanks,
Dominik
Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Timo Walther
@Patrick: Do you have an advice?


Am 11/22/17 um 5:52 PM schrieb [hidden email]:

> Hey everyone,
> I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7)
> to snapshot/checkpoint to an S3 bucket which is hosted in the
> eu-central-1 region. Everything works fine for other regions. I'm
> running my job on a JobTracker in local mode. I googled the internet
> and found several hints, most of them telling that setting the
> `fs.s3a.endpoint` should solve it. It doesn't. I'm also sure that the
> core-site.xml (see below) is picked up, if I put garbage into the
> endpoint then I receive a hostname not found error.
>
> The exception I'm getting is:
> com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400,
> AWS Service: Amazon S3, AWS Request ID: 432415098B0994BC, AWS Error
> Code: null, AWS Error Message: Bad Request, S3 Extended Request ID:
> 1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=
>
> I read the AWS FAQ but I don't think that
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request 
> applies to me as I'm not running the NativeFileSystem.
>
> I suspect this is related to the v4 signing protocol which is required
> for S3 in Frankfurt. Could it be that the aws-sdk version is just too
> old? I tried to play around with it but the hadoop adapter is
> incompatible with newer versions.
>
> I have the following core-site.xml:
>
> <?xml version="1.0"?>
> <configuration>
> <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
> <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
> <property><name>fs.s3a.access.key</name><value>something</value></property>
> <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
> <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com</value></property>
> </configuration
>
> Here is my lib folder with the versions of the aws-sdk and the
> hadoop-aws integration:
> -rw-------    1 root     root       11.4M Mar 20  2014
> aws-java-sdk-1.7.4.jar
> -rw-r--r--    1 1005     1006       70.0M Aug  3 12:10
> flink-dist_2.11-1.3.2.jar
> -rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07
> flink-python_2.11-1.3.2.jar
> -rw-r--r--    1 1005     1006       34.9M Aug  3 11:58
> flink-shaded-hadoop2-uber-1.3.2.jar
> -rw-------    1 root     root      100.7K Jan 14  2016
> hadoop-aws-2.7.2.jar
> -rw-------    1 root     root      414.7K May 17  2012 httpclient-4.2.jar
> -rw-------    1 root     root      218.0K May  1  2012 httpcore-4.2.jar
> -rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50 log4j-1.2.17.jar
> -rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50
> slf4j-log4j12-1.7.7.jar
>
> Can anyone give me any hints?
>
> Thanks,
> Dominik


Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Dominik Bruhn
In reply to this post by Dominik Bruhn
Hey,
can anyone give a hint? Does anyone have flink running with an S3 Bucket in Frankfurt/eu-central-1 and can share his config and setup?

Thanks,
Dominik

On 22. Nov 2017, at 17:52, [hidden email] wrote:

Hey everyone,
I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7) to snapshot/checkpoint to an S3 bucket which is hosted in the eu-central-1 region. Everything works fine for other regions. I'm running my job on a JobTracker in local mode. I googled the internet and found several hints, most of them telling that setting the `fs.s3a.endpoint` should solve it. It doesn't. I'm also sure that the core-site.xml (see below) is picked up, if I put garbage into the endpoint then I receive a hostname not found error.

The exception I'm getting is:
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 432415098B0994BC, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: 1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=

I read the AWS FAQ but I don't think that https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request applies to me as I'm not running the NativeFileSystem.

I suspect this is related to the v4 signing protocol which is required for S3 in Frankfurt. Could it be that the aws-sdk version is just too old? I tried to play around with it but the hadoop adapter is incompatible with newer versions.

I have the following core-site.xml:

<?xml version="1.0"?>
<configuration>
 <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
 <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
 <property><name>fs.s3a.access.key</name><value>something</value></property>
 <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
 <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com</value></property>
</configuration

Here is my lib folder with the versions of the aws-sdk and the hadoop-aws integration:
-rw-------    1 root     root       11.4M Mar 20  2014 aws-java-sdk-1.7.4.jar
-rw-r--r--    1 1005     1006       70.0M Aug  3 12:10 flink-dist_2.11-1.3.2.jar
-rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07 flink-python_2.11-1.3.2.jar
-rw-r--r--    1 1005     1006       34.9M Aug  3 11:58 flink-shaded-hadoop2-uber-1.3.2.jar
-rw-------    1 root     root      100.7K Jan 14  2016 hadoop-aws-2.7.2.jar
-rw-------    1 root     root      414.7K May 17  2012 httpclient-4.2.jar
-rw-------    1 root     root      218.0K May  1  2012 httpcore-4.2.jar
-rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50 log4j-1.2.17.jar
-rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50 slf4j-log4j12-1.7.7.jar

Can anyone give me any hints?

Thanks,
Dominik
Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Stephan Ewen
Hi!

The endpoint config entry looks correct.
I was looking at this issue to see if there are pointers to anything else, but it looks like the explicit endpoint entry is the most important thing: https://issues.apache.org/jira/browse/HADOOP-13324

I cc-ed Steve Loughran, who is Hadoop's S3 expert (sorry Steve for pulling you in again - listening and learning still about the subtle bits and pieces of S3).
@Steve are S3 V4 endpoints supported in Hadoop 2.7.x already, or only in Hadoop 2.8?

Best,
Stephan


On Mon, Nov 27, 2017 at 9:47 AM, Dominik Bruhn <[hidden email]> wrote:
Hey,
can anyone give a hint? Does anyone have flink running with an S3 Bucket in Frankfurt/eu-central-1 and can share his config and setup?

Thanks,
Dominik

On 22. Nov 2017, at 17:52, [hidden email] wrote:

Hey everyone,
I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7) to snapshot/checkpoint to an S3 bucket which is hosted in the eu-central-1 region. Everything works fine for other regions. I'm running my job on a JobTracker in local mode. I googled the internet and found several hints, most of them telling that setting the `fs.s3a.endpoint` should solve it. It doesn't. I'm also sure that the core-site.xml (see below) is picked up, if I put garbage into the endpoint then I receive a hostname not found error.

The exception I'm getting is:
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 432415098B0994BC, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: 1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=

I read the AWS FAQ but I don't think that https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request applies to me as I'm not running the NativeFileSystem.

I suspect this is related to the v4 signing protocol which is required for S3 in Frankfurt. Could it be that the aws-sdk version is just too old? I tried to play around with it but the hadoop adapter is incompatible with newer versions.

I have the following core-site.xml:

<?xml version="1.0"?>
<configuration>
 <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
 <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
 <property><name>fs.s3a.access.key</name><value>something</value></property>
 <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
 <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com</value></property>
</configuration

Here is my lib folder with the versions of the aws-sdk and the hadoop-aws integration:
-rw-------    1 root     root       11.4M Mar 20  2014 aws-java-sdk-1.7.4.jar
-rw-r--r--    1 1005     1006       70.0M Aug  3 12:10 flink-dist_2.11-1.3.2.jar
-rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07 flink-python_2.11-1.3.2.jar
-rw-r--r--    1 1005     1006       34.9M Aug  3 11:58 flink-shaded-hadoop2-uber-1.3.2.jar
-rw-------    1 root     root      100.7K Jan 14  2016 hadoop-aws-2.7.2.jar
-rw-------    1 root     root      414.7K May 17  2012 httpclient-4.2.jar
-rw-------    1 root     root      218.0K May  1  2012 httpcore-4.2.jar
-rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50 log4j-1.2.17.jar
-rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50 slf4j-log4j12-1.7.7.jar

Can anyone give me any hints?

Thanks,
Dominik

Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Stephan Ewen
Got a pointer from Steve that this is answered on Stack Overflow here: https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version

Flink 1.4 contains a specially bundled "fs-s3-hadoop" with smaller no footprint, compatible across Hadoop versions, and based on a later s3a and AWS sdk. In that connector, it should work out of the box because it uses a later AWS SDK. You can also use it with earlier Hadoop versions because dependencies are relocated, so it should not cash/conflict.




On Mon, Nov 27, 2017 at 8:58 PM, Stephan Ewen <[hidden email]> wrote:
Hi!

The endpoint config entry looks correct.
I was looking at this issue to see if there are pointers to anything else, but it looks like the explicit endpoint entry is the most important thing: https://issues.apache.org/jira/browse/HADOOP-13324

I cc-ed Steve Loughran, who is Hadoop's S3 expert (sorry Steve for pulling you in again - listening and learning still about the subtle bits and pieces of S3).
@Steve are S3 V4 endpoints supported in Hadoop 2.7.x already, or only in Hadoop 2.8?

Best,
Stephan


On Mon, Nov 27, 2017 at 9:47 AM, Dominik Bruhn <[hidden email]> wrote:
Hey,
can anyone give a hint? Does anyone have flink running with an S3 Bucket in Frankfurt/eu-central-1 and can share his config and setup?

Thanks,
Dominik

On 22. Nov 2017, at 17:52, [hidden email] wrote:

Hey everyone,
I'm trying since hours to get Flink 1.3.2 (downloaded for hadoop 2.7) to snapshot/checkpoint to an S3 bucket which is hosted in the eu-central-1 region. Everything works fine for other regions. I'm running my job on a JobTracker in local mode. I googled the internet and found several hints, most of them telling that setting the `fs.s3a.endpoint` should solve it. It doesn't. I'm also sure that the core-site.xml (see below) is picked up, if I put garbage into the endpoint then I receive a hostname not found error.

The exception I'm getting is:
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 432415098B0994BC, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: 1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=

I read the AWS FAQ but I don't think that https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request applies to me as I'm not running the NativeFileSystem.

I suspect this is related to the v4 signing protocol which is required for S3 in Frankfurt. Could it be that the aws-sdk version is just too old? I tried to play around with it but the hadoop adapter is incompatible with newer versions.

I have the following core-site.xml:

<?xml version="1.0"?>
<configuration>
 <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
 <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
 <property><name>fs.s3a.access.key</name><value>something</value></property>
 <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
 <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com</value></property>
</configuration

Here is my lib folder with the versions of the aws-sdk and the hadoop-aws integration:
-rw-------    1 root     root       11.4M Mar 20  2014 aws-java-sdk-1.7.4.jar
-rw-r--r--    1 1005     1006       70.0M Aug  3 12:10 flink-dist_2.11-1.3.2.jar
-rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07 flink-python_2.11-1.3.2.jar
-rw-r--r--    1 1005     1006       34.9M Aug  3 11:58 flink-shaded-hadoop2-uber-1.3.2.jar
-rw-------    1 root     root      100.7K Jan 14  2016 hadoop-aws-2.7.2.jar
-rw-------    1 root     root      414.7K May 17  2012 httpclient-4.2.jar
-rw-------    1 root     root      218.0K May  1  2012 httpcore-4.2.jar
-rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50 log4j-1.2.17.jar
-rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50 slf4j-log4j12-1.7.7.jar

Can anyone give me any hints?

Thanks,
Dominik


Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Dominik Bruhn
Hey Stephan, Hey Steve,
that was the right hint, adding that open to the Java-Options fixed the
problem. Maybe we should add this somehow to our Flink Wiki?

Thanks!

Dominik

On 28/11/17 11:55, Stephan Ewen wrote:

> Got a pointer from Steve that this is answered on Stack Overflow here:
> https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version 
> <https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version>
>
> Flink 1.4 contains a specially bundled "fs-s3-hadoop" with smaller no
> footprint, compatible across Hadoop versions, and based on a later s3a
> and AWS sdk. In that connector, it should work out of the box because it
> uses a later AWS SDK. You can also use it with earlier Hadoop versions
> because dependencies are relocated, so it should not cash/conflict.
>
>
>
>
> On Mon, Nov 27, 2017 at 8:58 PM, Stephan Ewen <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi!
>
>     The endpoint config entry looks correct.
>     I was looking at this issue to see if there are pointers to anything
>     else, but it looks like the explicit endpoint entry is the most
>     important thing: https://issues.apache.org/jira/browse/HADOOP-13324
>     <https://issues.apache.org/jira/browse/HADOOP-13324>
>
>     I cc-ed Steve Loughran, who is Hadoop's S3 expert (sorry Steve for
>     pulling you in again - listening and learning still about the subtle
>     bits and pieces of S3).
>     @Steve are S3 V4 endpoints supported in Hadoop 2.7.x already, or
>     only in Hadoop 2.8?
>
>     Best,
>     Stephan
>
>
>     On Mon, Nov 27, 2017 at 9:47 AM, Dominik Bruhn <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hey,
>         can anyone give a hint? Does anyone have flink running with an
>         S3 Bucket in Frankfurt/eu-central-1 and can share his config and
>         setup?
>
>         Thanks,
>         Dominik
>
>         On 22. Nov 2017, at 17:52, [hidden email]
>         <mailto:[hidden email]> wrote:
>
>>         Hey everyone,
>>         I'm trying since hours to get Flink 1.3.2 (downloaded for
>>         hadoop 2.7) to snapshot/checkpoint to an S3 bucket which is
>>         hosted in the eu-central-1 region. Everything works fine for
>>         other regions. I'm running my job on a JobTracker in local
>>         mode. I googled the internet and found several hints, most of
>>         them telling that setting the `fs.s3a.endpoint` should solve
>>         it. It doesn't. I'm also sure that the core-site.xml (see
>>         below) is picked up, if I put garbage into the endpoint then I
>>         receive a hostname not found error.
>>
>>         The exception I'm getting is:
>>         com.amazonaws.services.s3.model.AmazonS3Exception: Status
>>         Code: 400, AWS Service: Amazon S3, AWS Request ID:
>>         432415098B0994BC, AWS Error Code: null, AWS Error Message: Bad
>>         Request, S3 Extended Request ID:
>>         1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=
>>
>>         I read the AWS FAQ but I don't think that
>>         https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request
>>         <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request>
>>         applies to me as I'm not running the NativeFileSystem.
>>
>>         I suspect this is related to the v4 signing protocol which is
>>         required for S3 in Frankfurt. Could it be that the aws-sdk
>>         version is just too old? I tried to play around with it but
>>         the hadoop adapter is incompatible with newer versions.
>>
>>         I have the following core-site.xml:
>>
>>         <?xml version="1.0"?>
>>         <configuration>
>>          <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
>>          <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
>>          <property><name>fs.s3a.access.key</name><value>something</value></property>
>>          <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
>>          <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com
>>         <http://s3.eu-central-1.amazonaws.com></value></property>
>>         </configuration
>>
>>         Here is my lib folder with the versions of the aws-sdk and the
>>         hadoop-aws integration:
>>         -rw-------    1 root     root       11.4M Mar 20  2014
>>         aws-java-sdk-1.7.4.jar
>>         -rw-r--r--    1 1005     1006       70.0M Aug  3 12:10
>>         flink-dist_2.11-1.3.2.jar
>>         -rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07
>>         flink-python_2.11-1.3.2.jar
>>         -rw-r--r--    1 1005     1006       34.9M Aug  3 11:58
>>         flink-shaded-hadoop2-uber-1.3.2.jar
>>         -rw-------    1 root     root      100.7K Jan 14  2016
>>         hadoop-aws-2.7.2.jar
>>         -rw-------    1 root     root      414.7K May 17  2012
>>         httpclient-4.2.jar
>>         -rw-------    1 root     root      218.0K May  1  2012
>>         httpcore-4.2.jar
>>         -rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50
>>         log4j-1.2.17.jar
>>         -rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50
>>         slf4j-log4j12-1.7.7.jar
>>
>>         Can anyone give me any hints?
>>
>>         Thanks,
>>         Dominik
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Ufuk Celebi
Hey Dominik,

yes, we should definitely add this to the docs.

@Nico: You recently updated the Flink S3 setup docs. Would you mind
adding these hints for eu-central-1 from Steve? I think that would be
super helpful!

Best,

Ufuk

On Tue, Nov 28, 2017 at 10:00 PM, Dominik Bruhn <[hidden email]> wrote:

> Hey Stephan, Hey Steve,
> that was the right hint, adding that open to the Java-Options fixed the
> problem. Maybe we should add this somehow to our Flink Wiki?
>
> Thanks!
>
> Dominik
>
> On 28/11/17 11:55, Stephan Ewen wrote:
>>
>> Got a pointer from Steve that this is answered on Stack Overflow here:
>> https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version
>> <https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version>
>>
>> Flink 1.4 contains a specially bundled "fs-s3-hadoop" with smaller no
>> footprint, compatible across Hadoop versions, and based on a later s3a and
>> AWS sdk. In that connector, it should work out of the box because it uses a
>> later AWS SDK. You can also use it with earlier Hadoop versions because
>> dependencies are relocated, so it should not cash/conflict.
>>
>>
>>
>>
>> On Mon, Nov 27, 2017 at 8:58 PM, Stephan Ewen <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi!
>>
>>     The endpoint config entry looks correct.
>>     I was looking at this issue to see if there are pointers to anything
>>     else, but it looks like the explicit endpoint entry is the most
>>     important thing: https://issues.apache.org/jira/browse/HADOOP-13324
>>     <https://issues.apache.org/jira/browse/HADOOP-13324>
>>
>>     I cc-ed Steve Loughran, who is Hadoop's S3 expert (sorry Steve for
>>     pulling you in again - listening and learning still about the subtle
>>     bits and pieces of S3).
>>     @Steve are S3 V4 endpoints supported in Hadoop 2.7.x already, or
>>     only in Hadoop 2.8?
>>
>>     Best,
>>     Stephan
>>
>>
>>     On Mon, Nov 27, 2017 at 9:47 AM, Dominik Bruhn <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>
>>         Hey,
>>         can anyone give a hint? Does anyone have flink running with an
>>         S3 Bucket in Frankfurt/eu-central-1 and can share his config and
>>         setup?
>>
>>         Thanks,
>>         Dominik
>>
>>         On 22. Nov 2017, at 17:52, [hidden email]
>>         <mailto:[hidden email]> wrote:
>>
>>>         Hey everyone,
>>>         I'm trying since hours to get Flink 1.3.2 (downloaded for
>>>         hadoop 2.7) to snapshot/checkpoint to an S3 bucket which is
>>>         hosted in the eu-central-1 region. Everything works fine for
>>>         other regions. I'm running my job on a JobTracker in local
>>>         mode. I googled the internet and found several hints, most of
>>>         them telling that setting the `fs.s3a.endpoint` should solve
>>>         it. It doesn't. I'm also sure that the core-site.xml (see
>>>         below) is picked up, if I put garbage into the endpoint then I
>>>         receive a hostname not found error.
>>>
>>>         The exception I'm getting is:
>>>         com.amazonaws.services.s3.model.AmazonS3Exception: Status
>>>         Code: 400, AWS Service: Amazon S3, AWS Request ID:
>>>         432415098B0994BC, AWS Error Code: null, AWS Error Message: Bad
>>>         Request, S3 Extended Request ID:
>>>
>>> 1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=
>>>
>>>         I read the AWS FAQ but I don't think that
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request
>>>
>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request>
>>>         applies to me as I'm not running the NativeFileSystem.
>>>
>>>         I suspect this is related to the v4 signing protocol which is
>>>         required for S3 in Frankfurt. Could it be that the aws-sdk
>>>         version is just too old? I tried to play around with it but
>>>         the hadoop adapter is incompatible with newer versions.
>>>
>>>         I have the following core-site.xml:
>>>
>>>         <?xml version="1.0"?>
>>>         <configuration>
>>>
>>> <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
>>>
>>> <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
>>>
>>> <property><name>fs.s3a.access.key</name><value>something</value></property>
>>>
>>> <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
>>>
>>> <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com
>>>         <http://s3.eu-central-1.amazonaws.com></value></property>
>>>         </configuration
>>>
>>>         Here is my lib folder with the versions of the aws-sdk and the
>>>         hadoop-aws integration:
>>>         -rw-------    1 root     root       11.4M Mar 20  2014
>>>         aws-java-sdk-1.7.4.jar
>>>         -rw-r--r--    1 1005     1006       70.0M Aug  3 12:10
>>>         flink-dist_2.11-1.3.2.jar
>>>         -rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07
>>>         flink-python_2.11-1.3.2.jar
>>>         -rw-r--r--    1 1005     1006       34.9M Aug  3 11:58
>>>         flink-shaded-hadoop2-uber-1.3.2.jar
>>>         -rw-------    1 root     root      100.7K Jan 14  2016
>>>         hadoop-aws-2.7.2.jar
>>>         -rw-------    1 root     root      414.7K May 17  2012
>>>         httpclient-4.2.jar
>>>         -rw-------    1 root     root      218.0K May  1  2012
>>>         httpcore-4.2.jar
>>>         -rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50
>>>         log4j-1.2.17.jar
>>>         -rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50
>>>         slf4j-log4j12-1.7.7.jar
>>>
>>>         Can anyone give me any hints?
>>>
>>>         Thanks,
>>>         Dominik
>>
>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: S3 Access in eu-central-1

Nico Kruber
Sorry for the late response,
but I finally got around adding this workaround to our "common issues"
section with PR https://github.com/apache/flink/pull/5231

Nico

On 29/11/17 09:31, Ufuk Celebi wrote:

> Hey Dominik,
>
> yes, we should definitely add this to the docs.
>
> @Nico: You recently updated the Flink S3 setup docs. Would you mind
> adding these hints for eu-central-1 from Steve? I think that would be
> super helpful!
>
> Best,
>
> Ufuk
>
> On Tue, Nov 28, 2017 at 10:00 PM, Dominik Bruhn <[hidden email]> wrote:
>> Hey Stephan, Hey Steve,
>> that was the right hint, adding that open to the Java-Options fixed the
>> problem. Maybe we should add this somehow to our Flink Wiki?
>>
>> Thanks!
>>
>> Dominik
>>
>> On 28/11/17 11:55, Stephan Ewen wrote:
>>>
>>> Got a pointer from Steve that this is answered on Stack Overflow here:
>>> https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version
>>> <https://stackoverflow.com/questions/36154484/aws-java-sdk-manually-set-signature-version>
>>>
>>> Flink 1.4 contains a specially bundled "fs-s3-hadoop" with smaller no
>>> footprint, compatible across Hadoop versions, and based on a later s3a and
>>> AWS sdk. In that connector, it should work out of the box because it uses a
>>> later AWS SDK. You can also use it with earlier Hadoop versions because
>>> dependencies are relocated, so it should not cash/conflict.
>>>
>>>
>>>
>>>
>>> On Mon, Nov 27, 2017 at 8:58 PM, Stephan Ewen <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>>     Hi!
>>>
>>>     The endpoint config entry looks correct.
>>>     I was looking at this issue to see if there are pointers to anything
>>>     else, but it looks like the explicit endpoint entry is the most
>>>     important thing: https://issues.apache.org/jira/browse/HADOOP-13324
>>>     <https://issues.apache.org/jira/browse/HADOOP-13324>
>>>
>>>     I cc-ed Steve Loughran, who is Hadoop's S3 expert (sorry Steve for
>>>     pulling you in again - listening and learning still about the subtle
>>>     bits and pieces of S3).
>>>     @Steve are S3 V4 endpoints supported in Hadoop 2.7.x already, or
>>>     only in Hadoop 2.8?
>>>
>>>     Best,
>>>     Stephan
>>>
>>>
>>>     On Mon, Nov 27, 2017 at 9:47 AM, Dominik Bruhn <[hidden email]
>>>     <mailto:[hidden email]>> wrote:
>>>
>>>         Hey,
>>>         can anyone give a hint? Does anyone have flink running with an
>>>         S3 Bucket in Frankfurt/eu-central-1 and can share his config and
>>>         setup?
>>>
>>>         Thanks,
>>>         Dominik
>>>
>>>         On 22. Nov 2017, at 17:52, [hidden email]
>>>         <mailto:[hidden email]> wrote:
>>>
>>>>         Hey everyone,
>>>>         I'm trying since hours to get Flink 1.3.2 (downloaded for
>>>>         hadoop 2.7) to snapshot/checkpoint to an S3 bucket which is
>>>>         hosted in the eu-central-1 region. Everything works fine for
>>>>         other regions. I'm running my job on a JobTracker in local
>>>>         mode. I googled the internet and found several hints, most of
>>>>         them telling that setting the `fs.s3a.endpoint` should solve
>>>>         it. It doesn't. I'm also sure that the core-site.xml (see
>>>>         below) is picked up, if I put garbage into the endpoint then I
>>>>         receive a hostname not found error.
>>>>
>>>>         The exception I'm getting is:
>>>>         com.amazonaws.services.s3.model.AmazonS3Exception: Status
>>>>         Code: 400, AWS Service: Amazon S3, AWS Request ID:
>>>>         432415098B0994BC, AWS Error Code: null, AWS Error Message: Bad
>>>>         Request, S3 Extended Request ID:
>>>>
>>>> 1PSDe4EOh7zvfNPdWrwoBKKOtsS/gf9atn5movRzcpvIH2WsR+ptXvXyFyEHXjDb3F9AniXgsBQ=
>>>>
>>>>         I read the AWS FAQ but I don't think that
>>>>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request
>>>>
>>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#ioexception-400-bad-request>
>>>>         applies to me as I'm not running the NativeFileSystem.
>>>>
>>>>         I suspect this is related to the v4 signing protocol which is
>>>>         required for S3 in Frankfurt. Could it be that the aws-sdk
>>>>         version is just too old? I tried to play around with it but
>>>>         the hadoop adapter is incompatible with newer versions.
>>>>
>>>>         I have the following core-site.xml:
>>>>
>>>>         <?xml version="1.0"?>
>>>>         <configuration>
>>>>
>>>> <property><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value></property>
>>>>
>>>> <property><name>fs.s3a.buffer.dir</name><value>/tmp</value></property>
>>>>
>>>> <property><name>fs.s3a.access.key</name><value>something</value></property>
>>>>
>>>> <property><name>fs.s3a.secret.key</name><value>wont-tell</value></property>
>>>>
>>>> <property><name>fs.s3a.endpoint</name><value>s3.eu-central-1.amazonaws.com
>>>>         <http://s3.eu-central-1.amazonaws.com></value></property>
>>>>         </configuration
>>>>
>>>>         Here is my lib folder with the versions of the aws-sdk and the
>>>>         hadoop-aws integration:
>>>>         -rw-------    1 root     root       11.4M Mar 20  2014
>>>>         aws-java-sdk-1.7.4.jar
>>>>         -rw-r--r--    1 1005     1006       70.0M Aug  3 12:10
>>>>         flink-dist_2.11-1.3.2.jar
>>>>         -rw-rw-r--    1 1005     1006       98.3K Aug  3 12:07
>>>>         flink-python_2.11-1.3.2.jar
>>>>         -rw-r--r--    1 1005     1006       34.9M Aug  3 11:58
>>>>         flink-shaded-hadoop2-uber-1.3.2.jar
>>>>         -rw-------    1 root     root      100.7K Jan 14  2016
>>>>         hadoop-aws-2.7.2.jar
>>>>         -rw-------    1 root     root      414.7K May 17  2012
>>>>         httpclient-4.2.jar
>>>>         -rw-------    1 root     root      218.0K May  1  2012
>>>>         httpcore-4.2.jar
>>>>         -rw-rw-r--    1 1005     1006      478.4K Jul 28 14:50
>>>>         log4j-1.2.17.jar
>>>>         -rw-rw-r--    1 1005     1006        8.7K Jul 28 14:50
>>>>         slf4j-log4j12-1.7.7.jar
>>>>
>>>>         Can anyone give me any hints?
>>>>
>>>>         Thanks,
>>>>         Dominik
>>>
>>>
>>>
>>>
>>


signature.asc (201 bytes) Download Attachment