(DEPRECATED) Apache Flink User Mailing List archive.

Re: Processing S3 data with Apache Flink

Posted by Kostiantyn Kudriavtsev on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Processing-S3-data-with-Apache-Flink-tp3046p3062.html

Hi Robert,

thank you very much for your input!

Have you tried that?

With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward, and now got a new exception:

Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden

it's really strange as far as I gave full permissions to authenticated users and can get target file from s3cmd or s3 browser from the same PC... I realize that it's question not to you, but perhaps you have faced the same issue

Thanks in advance!

Kostia

Thank you,
Konstantin Kudryavtsev

On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger <[hidden email]> wrote:

Hi Kostia,

thank you for writing to the Flink mailing list. I actually started to try out our S3 File system support after I saw your question on StackOverflow [1].
I found that our S3 connector is very broken. I had to resolve two more issues with it, before I was able to get the same exception you reported.

Another Flink commiter looked into the issue as well (it was confirmed as well) but there was no solution [2].

So for now, I would say we have to assume that our S3 connector is not working. I will start a separate discussion at the developer mailing list to remove our S3 connector.

The good news is that you can just use Hadoop's S3 File System implementation with Flink.

I used this Flink program to verify its working:
public class S3FileSystem {
   public static void main(String[] args) throws Exception {
      ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
      DataSet<String> myLines = ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
      myLines.print();
   }
}
also, you need to make a Hadoop configuration file available to Flink.
When running flink locally in your IDE, just create a "core-site.xml" in the src/main/resource folder, with the following content:
<configuration>

    <property>
        <name>fs.s3n.awsAccessKeyId</name>
        <value>putKeyHere</value>
    </property>

    <property>
        <name>fs.s3n.awsSecretAccessKey</name>
        <value>putSecretHere</value>
    </property>
    <property>
        <name>fs.s3n.impl</name>
        <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    </property>
</configuration>
Maybe you are running on a cluster, then re-use the existing core-site.xml file (= edit it) and point to the directory using Flink's fs.hdfs.hadoopconf configuration option.
With these two things in place, you should be good to go.

[1] http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
[2] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html

On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <[hidden email]> wrote:
Hi guys,

I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
data from S3. I tried the following path for data
s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
exception:

java.io.IOException: Cannot establish connection to Amazon S3:
com.amazonaws.services.s3.model.AmazonS3Exception: The request signature
we calculated does not match the signature you provided. Check your key
and signing method. (Service: Amazon S3; Status Code: 403;

I added access and secret keys, so the problem is not here. I=92m using
standard region and gave read credential to everyone.

Any ideas how can it be fixed?

Thank you in advance,
Kostia