Azure Blob Storage Connector

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view

Azure Blob Storage Connector

MIkkel Islay

I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface.
That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it was written pre 1.0.
Does anyone have experience with setting up a Azure blob connector?

Reply | Threaded
Open this post in threaded view

Re: Azure Blob Storage Connector

Ufuk Celebi
You should be able to follow this:

It's similar to the AWS S3 config

Add the Azure JARs to Flink (add them to the lib folder), configure
the fs.hdfs.hadoopconf key to point to your Hadoop config directory,
and update the core-site.xml like in the mailing list Thread.

Then you should be able to access your data via azure://...

Would appreciate some feedback whether this works as expected.

On Tue, Aug 16, 2016 at 2:37 PM, MIkkel Islay <[hidden email]> wrote:

> Hello,
> I would like to access data in Azure blob storage from Flink, via the Azure
> storage HDFS-compatibility interface.
> That is feasible from Apache Drill, and I am thinking something similar
> should be doable from Flink. A documentation page on eternal storage
> connectors for Flink exist, but it was written pre 1.0.
> Does anyone have experience with setting up a Azure blob connector?
> Mikkel
Reply | Threaded
Open this post in threaded view

Re: Azure Blob Storage Connector

MIkkel Islay
Hello Ufuk,

Thanks for your swift reply.
Those are essentially the steps I took for Drill. I am happy to report back with my success, or otherwise.


On Tue, Aug 16, 2016 at 12:40 PM, Ufuk Celebi <[hidden email]> wrote:
You should be able to follow this:

It's similar to the AWS S3 config

Add the Azure JARs to Flink (add them to the lib folder), configure
the fs.hdfs.hadoopconf key to point to your Hadoop config directory,
and update the core-site.xml like in the mailing list Thread.

Then you should be able to access your data via azure://...

Would appreciate some feedback whether this works as expected.

On Tue, Aug 16, 2016 at 2:37 PM, MIkkel Islay <[hidden email]> wrote:
> Hello,
> I would like to access data in Azure blob storage from Flink, via the Azure
> storage HDFS-compatibility interface.
> That is feasible from Apache Drill, and I am thinking something similar
> should be doable from Flink. A documentation page on eternal storage
> connectors for Flink exist, but it was written pre 1.0.
> Does anyone have experience with setting up a Azure blob connector?
> Mikkel

Reply | Threaded
Open this post in threaded view

Re: Azure Blob Storage Connector

Lau Sennels
In reply to this post by MIkkel Islay
I have successfully connected Azure blob storage to Flink-1.1.

Below are the steps necessary:
- Add hadoop-azure-2.7.2.jar (assuming you are using a Hadoop 2.7 Flink binary) and azure-storage-4.3.0.jar to <flinkdir>/lib, and set file permissions / ownership accordingly.
- Add the following to a file 'core-site.xml'




- Update the parameter fs.hdfs.hadoopconf: to the path for the directory where core-site.xml is located.
- Restart Flink

It is now possible to read from blobs (block and page) by referencing 'wasb://CONTAINERNAME@.../PATH' or


On 16 August 2016 at 14:37, MIkkel Islay <[hidden email]> wrote:
> Hello,
> I would like to access data in Azure blob storage from Flink, via the Azure storage HDFS-compatibility interface.
> That is feasible from Apache Drill, and I am thinking something similar should be doable from Flink. A documentation page on eternal storage connectors for Flink exist, but it was written pre 1.0.
> Does anyone have experience with setting up a Azure blob connector?
> Mikkel

Lau Sennels
Founder, scaling biologist