http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Processing-S3-data-with-Apache-Flink-tp3046p3047.html
Hi Kostia,
thank you for writing to the Flink mailing list. I actually started to try out our S3 File system support after I saw your question on StackOverflow [1].
I found that our S3 connector is very broken. I had to resolve two more issues with it, before I was able to get the same exception you reported.
Another Flink commiter looked into the issue as well (it was confirmed as well) but there was no solution [2].
So for now, I would say we have to assume that our S3 connector is not working. I will start a separate discussion at the developer mailing list to remove our S3 connector.
The good news is that you can just use Hadoop's S3 File System implementation with Flink.
I used this Flink program to verify its working:
public class S3FileSystem {
public static void main(String[] args) throws Exception {
ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
DataSet<String> myLines = ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
myLines.print();
}
}
also, you need to make a Hadoop configuration file available to Flink.
When running flink locally in your IDE, just create a "core-site.xml" in the src/main/resource folder, with the following content:
<configuration>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>putKeyHere</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>putSecretHere</value>
</property>
<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>
</configuration>
Maybe you are running on a cluster, then re-use the existing core-site.xml file (= edit it) and point to the directory using Flink's fs.hdfs.hadoopconf configuration option.
With these two things in place, you should be good to go.