Azure Blob Storage Connector

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Azure Blob Storage Connector

MIkkel Islay
Hello,

I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface.
That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it was written pre 1.0.
Does anyone have experience with setting up a Azure blob connector?

Mikkel
Reply | Threaded
Open this post in threaded view
|

Re: Azure Blob Storage Connector

Ufuk Celebi
You should be able to follow this:

http://mail-archives.apache.org/mod_mbox/drill-user/201512.mbox/%3CCAAL5oQJQRgqO5LjhG_=YFLyHuZUNqEvm3VX3C=2d9UXnBTok4g@...%3E

It's similar to the AWS S3 config
(https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html).

Add the Azure JARs to Flink (add them to the lib folder), configure
the fs.hdfs.hadoopconf key to point to your Hadoop config directory,
and update the core-site.xml like in the mailing list Thread.

Then you should be able to access your data via azure://...

Would appreciate some feedback whether this works as expected.


On Tue, Aug 16, 2016 at 2:37 PM, MIkkel Islay <[hidden email]> wrote:

> Hello,
>
> I would like to access data in Azure blob storage from Flink, via the Azure
> storage HDFS-compatibility interface.
> That is feasible from Apache Drill, and I am thinking something similar
> should be doable from Flink. A documentation page on eternal storage
> connectors for Flink exist, but it was written pre 1.0.
> Does anyone have experience with setting up a Azure blob connector?
>
> Mikkel
Reply | Threaded
Open this post in threaded view
|

Re: Azure Blob Storage Connector

MIkkel Islay
Hello Ufuk,

Thanks for your swift reply.
Those are essentially the steps I took for Drill. I am happy to report back with my success, or otherwise.

Mikkel

On Tue, Aug 16, 2016 at 12:40 PM, Ufuk Celebi <[hidden email]> wrote:
You should be able to follow this:

http://mail-archives.apache.org/mod_mbox/drill-user/201512.mbox/%3CCAAL5oQJQRgqO5LjhG_=YFLyHuZUNqEvm3VX3C=2d9UXnBTok4g@...%3E

It's similar to the AWS S3 config
(https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html).

Add the Azure JARs to Flink (add them to the lib folder), configure
the fs.hdfs.hadoopconf key to point to your Hadoop config directory,
and update the core-site.xml like in the mailing list Thread.

Then you should be able to access your data via azure://...

Would appreciate some feedback whether this works as expected.


On Tue, Aug 16, 2016 at 2:37 PM, MIkkel Islay <[hidden email]> wrote:
> Hello,
>
> I would like to access data in Azure blob storage from Flink, via the Azure
> storage HDFS-compatibility interface.
> That is feasible from Apache Drill, and I am thinking something similar
> should be doable from Flink. A documentation page on eternal storage
> connectors for Flink exist, but it was written pre 1.0.
> Does anyone have experience with setting up a Azure blob connector?
>
> Mikkel

Reply | Threaded
Open this post in threaded view
|

Re: Azure Blob Storage Connector

Lau Sennels
In reply to this post by MIkkel Islay
I have successfully connected Azure blob storage to Flink-1.1.

Below are the steps necessary:
- Add hadoop-azure-2.7.2.jar (assuming you are using a Hadoop 2.7 Flink binary) and azure-storage-4.3.0.jar to <flinkdir>/lib, and set file permissions / ownership accordingly.
- Add the following to a file 'core-site.xml'

<property>
        <name>fs.wasb.impl</name>
        <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
 </property>

<property>
        <name>fs.wasbs.impl</name>
        <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
 </property>

<property>
        <name>fs.azure.account.key.STORAGEACCOUNTNAME.blob.core.windows.net</name>
        <value>ACCOUNTKEY</value>
</property>

- Update the parameter fs.hdfs.hadoopconf: to the path for the directory where core-site.xml is located.
- Restart Flink

It is now possible to read from blobs (block and page) by referencing 'wasb://CONTAINERNAME@.../PATH' or
'wasbs://CONTAINERNAME@.../PATH'

Regards,
Lau

On 16 August 2016 at 14:37, MIkkel Islay <[hidden email]> wrote:
>
> Hello,
>
> I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface.
> That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it was written pre 1.0.
> Does anyone have experience with setting up a Azure blob connector?
>
> Mikkel




--
Lau Sennels
Founder, scaling biologist
https://dk.linkedin.com/pub/lau-sennels/a9/3b5/196