Hello, I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface. That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it was written pre 1.0. Does anyone have experience with setting up a Azure blob connector? Mikkel |
You should be able to follow this:
http://mail-archives.apache.org/mod_mbox/drill-user/201512.mbox/%3CCAAL5oQJQRgqO5LjhG_=YFLyHuZUNqEvm3VX3C=2d9UXnBTok4g@...%3E It's similar to the AWS S3 config (https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html). Add the Azure JARs to Flink (add them to the lib folder), configure the fs.hdfs.hadoopconf key to point to your Hadoop config directory, and update the core-site.xml like in the mailing list Thread. Then you should be able to access your data via azure://... Would appreciate some feedback whether this works as expected. On Tue, Aug 16, 2016 at 2:37 PM, MIkkel Islay <[hidden email]> wrote: > Hello, > > I would like to access data in Azure blob storage from Flink, via the Azure > storage HDFS-compatibility interface. > That is feasible from Apache Drill, and I am thinking something similar > should be doable from Flink. A documentation page on eternal storage > connectors for Flink exist, but it was written pre 1.0. > Does anyone have experience with setting up a Azure blob connector? > > Mikkel |
Hello Ufuk, Thanks for your swift reply. Those are essentially the steps I took for Drill. I am happy to report back with my success, or otherwise. Mikkel On Tue, Aug 16, 2016 at 12:40 PM, Ufuk Celebi <[hidden email]> wrote: You should be able to follow this: |
In reply to this post by MIkkel Islay
I have successfully connected Azure blob storage to Flink-1.1.
Below are the steps necessary: - Add hadoop-azure-2.7.2.jar (assuming you are using a Hadoop 2.7 Flink binary) and azure-storage-4.3.0.jar to <flinkdir>/lib, and set file permissions / ownership accordingly. - Add the following to a file 'core-site.xml' <property> <name>fs.wasb.impl</name> <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value> </property> <property> <name>fs.wasbs.impl</name> <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value> </property> <property> <name>fs.azure.account.key.STORAGEACCOUNTNAME.blob.core.windows.net</name> <value>ACCOUNTKEY</value> </property> - Update the parameter fs.hdfs.hadoopconf: to the path for the directory where core-site.xml is located. - Restart Flink It is now possible to read from blobs (block and page) by referencing 'wasb://CONTAINERNAME@.../PATH' or 'wasbs://CONTAINERNAME@.../PATH' Regards, Lau On 16 August 2016 at 14:37, MIkkel Islay <[hidden email]> wrote: > > Hello, > > I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface. > That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it was written pre 1.0. > Does anyone have experience with setting up a Azure blob connector? > > Mikkel -- Lau Sennels Founder, scaling biologist https://dk.linkedin.com/pub/lau-sennels/a9/3b5/196 |
Free forum by Nabble | Edit this page |