Problem with Amazon S3

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with Amazon S3

pietro
Dear all,
I have been developing a Flink application that has to run on Amazon Elastic Map Reduce.

For convenience the data that the application has to read and write are on the S3.

But, I have not been able to access S3 .This is the error I got:
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (LowLevel.FlinkImplementation.MyWriter@826539e)': Cannot determine access key to Amazon S3

What shall I do?

Many thanks in advance
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

Ufuk Celebi
Hey Pietro!

You have to add the following lines to your flink-conf.yaml:

fs.s3.accessKey: <YOUR ACCESS KEY>
fs.s3.secretKey: <YOUR SECRET KEY>

I will fix the error message to include a hint on how to configure this correctly.

– Ufuk


On Tue, Mar 31, 2015 at 10:53 AM, pietro <[hidden email]> wrote:
Dear all,
I have been developing a Flink application that has to run on Amazon Elastic
Map Reduce.

For convenience the data that the application has to read and write are on
the S3.

But, I have not been able to access S3 .This is the error I got:
org.apache.flink.client.program.ProgramInvocationException: The program
execution failed: Cannot initialize task 'DataSink
(LowLevel.FlinkImplementation.MyWriter@826539e)': Cannot determine access
key to Amazon S3

What shall I do?

Many thanks in advance



--
View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-tp946.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

pietro
Thank you Ufuk! That helped a lot.

But I have an other problem now.  

Am I missing something?

Caused by: java.net.UnknownHostException: MYBUCKETNAME
        at java.net.InetAddress.getAllByName0(InetAddress.java:1250)
        at java.net.InetAddress.getAllByName(InetAddress.java:1162)
        at java.net.InetAddress.getAllByName(InetAddress.java:1098)
        at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:278)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:162)
        at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
        at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:641)
        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:402)
        ... 34 more

        at org.apache.flink.runtime.fs.s3.S3FileSystem.initializeDirectoryStructure(S3FileSystem.java:248)
        at org.apache.flink.runtime.fs.s3.S3FileSystem.initialize(S3FileSystem.java:222)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:258)
        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:310)
        at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:402)
        at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51)
        at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:145)
        ... 23 more
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

Stephan Ewen
It looks like the S3 URL is in an unexpected format. It tries to use the bucketname as the hostname. Can you tell us the S3 URL (without user / password), so we can take a look?

Greetings,
Stephan


On Tue, Mar 31, 2015 at 12:52 PM, pietro <[hidden email]> wrote:
Thank you Ufuk! That helped a lot.

But I have an other problem now.

Am I missing something?

Caused by: java.net.UnknownHostException: MYBUCKETNAME
        at java.net.InetAddress.getAllByName0(InetAddress.java:1250)
        at java.net.InetAddress.getAllByName(InetAddress.java:1162)
        at java.net.InetAddress.getAllByName(InetAddress.java:1098)
        at
org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
        at
org.apache.http.impl.conn.DefaultClientConnectionOperator.resolveHostname(DefaultClientConnectionOperator.java:278)
        at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:162)
        at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:641)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
        at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:402)
        ... 34 more

        at
org.apache.flink.runtime.fs.s3.S3FileSystem.initializeDirectoryStructure(S3FileSystem.java:248)
        at
org.apache.flink.runtime.fs.s3.S3FileSystem.initialize(S3FileSystem.java:222)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:258)
        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:310)
        at
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:402)
        at
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51)
        at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:145)
        ... 23 more



--
View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-tp946p949.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

pietro
Hi Stephan,

for sure I can: this is how I try to read from Flink:

env.readFile(new DefaultReader(), "s3://genomic/flink/input/meta/1.txt" ).map(parser(_))}

I use the same format I was used to use in Pig.

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

Ufuk Celebi
Hey Pietro,

I've debugged this locally and I can get a connection to a S3 bucket with the following format:

s3://<BUCKET NAME>.s3.amazonaws.com/<KEY>

Depending on the region of your S3 bucket, you have to use a different endpoint (http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region).

So for example: s3://genomic.s3.amazonaws.com/flink/input/meta/1.txt (assuming that "genomic" is your  bucket in US standard).

Does this work? Sorry for the inconvenience so far.



On Wed, Apr 1, 2015 at 10:58 AM, pietro <[hidden email]> wrote:
Hi Stephan,

for sure I can: this is how I try to read from Flink:

env.readFile(new DefaultReader(), "s3://genomic/flink/input/meta/1.txt"
).map(parser(_))}

I use the same format I was used to use in Pig.

Thanks,




--
View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-tp946p952.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

pietro
Dears,

I am still having problem retriving data from the S3. I followed all you indication in the previous posts, but now I get this error:

15/05/20 10:47:05 INFO s3.S3FileSystem: Creating new S3 file system binding with Reduced Redundancy Storage enabled
15/05/20 10:47:13 WARN io.DelimitedInputFormat: Could not determine statistics for file 's3://genomic.s3-us-west-2.amazonaws.com/flink/ref/meta' due to an io error: Cannot establish connection to Amazon S3: com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: E719C84656C22D70), S3 Extended Request ID: 5yE3QhMxlrVuCiPe5lN/cVAWptceXRNuSUmIG9kwRtioimOX3znU4Fj3aY7+P1MTR4BTecyTvVM=



I checked the keys in the flink-conf.yaml and they are correct.

Any idea?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

Aljoscha Krettek
I am getting the same error as you are. Investigating now.

On Wed, May 20, 2015 at 12:34 PM, pietro <[hidden email]> wrote:

> Dears,
>
> I am still having problem retriving data from the S3. I followed all you
> indication in the previous posts, but now I get this error:
>
> 15/05/20 10:47:05 INFO s3.S3FileSystem: Creating new S3 file system binding
> with Reduced Redundancy Storage enabled
> 15/05/20 10:47:13 WARN io.DelimitedInputFormat: Could not determine
> statistics for file 's3://genomic.s3-us-west-2.amazonaws.com/flink/ref/meta'
> due to an io error: Cannot establish connection to Amazon S3:
> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we
> calculated does not match the signature you provided. Check your key and
> signing method. (Service: Amazon S3; Status Code: 403; Error Code:
> SignatureDoesNotMatch; Request ID: E719C84656C22D70), S3 Extended Request
> ID:
> 5yE3QhMxlrVuCiPe5lN/cVAWptceXRNuSUmIG9kwRtioimOX3znU4Fj3aY7+P1MTR4BTecyTvVM=
>
>
>
> I checked the keys in the flink-conf.yaml and they are correct.
>
> Any idea?
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-tp946p1310.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Amazon S3

rmetzger0
Flink allows to use Hadoop's FileSystem interface as well [1].

Hadoop actually ships a s3 file system implementation by default, and I suspect its in a better shape than Flink's implementation. Maybe it would make sense to use Hadoop's S3 implementation through Flink's Hadoop FS support.

Please let me know if you are facing any issues while using this approach.


On Fri, May 22, 2015 at 6:27 PM, Aljoscha Krettek <[hidden email]> wrote:
I am getting the same error as you are. Investigating now.

On Wed, May 20, 2015 at 12:34 PM, pietro <[hidden email]> wrote:
> Dears,
>
> I am still having problem retriving data from the S3. I followed all you
> indication in the previous posts, but now I get this error:
>
> 15/05/20 10:47:05 INFO s3.S3FileSystem: Creating new S3 file system binding
> with Reduced Redundancy Storage enabled
> 15/05/20 10:47:13 WARN io.DelimitedInputFormat: Could not determine
> statistics for file 's3://genomic.s3-us-west-2.amazonaws.com/flink/ref/meta'
> due to an io error: Cannot establish connection to Amazon S3:
> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we
> calculated does not match the signature you provided. Check your key and
> signing method. (Service: Amazon S3; Status Code: 403; Error Code:
> SignatureDoesNotMatch; Request ID: E719C84656C22D70), S3 Extended Request
> ID:
> 5yE3QhMxlrVuCiPe5lN/cVAWptceXRNuSUmIG9kwRtioimOX3znU4Fj3aY7+P1MTR4BTecyTvVM=
>
>
>
> I checked the keys in the flink-conf.yaml and they are correct.
>
> Any idea?
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-tp946p1310.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.