Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

Mustafa AKIN
Hi all,

I am trying to use S3 backend with custom endpoint. However, it is not supported in hadoop-aws@2.7.3, I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers: 

Because "fs.s3a.path.style.access" is not recognized in old version.I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes: 

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:


According to, I need hadoop-aws@2.7.2 and 

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)


Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.
Reply | Threaded
Open this post in threaded view
|

Re: Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

Aljoscha Krettek
So you're saying that this works if you manually compile Flink for Hadoop 2.8.0? If yes, I think the solution is that we have to provide binaries for Hadoop 2.8.0. If we did that with a possible Flink 1.3.3 release and starting from Flink 1.4.0, would this be an option for you?

Best,
Aljoscha
On 11. Jul 2017, at 10:47, Mustafa AKIN <[hidden email]> wrote:

Hi all,

I am trying to use S3 backend with custom endpoint. However, it is not supported in hadoop-aws@2.7.3, I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers: 

Because "fs.s3a.path.style.access" is not recognized in old version.I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes: 

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:


According to, I need hadoop-aws@2.7.2 and 

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)


Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.

Reply | Threaded
Open this post in threaded view
|

Re: Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

Eron Wright
For reference: [FLINK-6466] Build Hadoop 2.8.0 convenience binaries

On Wed, Aug 9, 2017 at 6:41 AM, Aljoscha Krettek <[hidden email]> wrote:
So you're saying that this works if you manually compile Flink for Hadoop 2.8.0? If yes, I think the solution is that we have to provide binaries for Hadoop 2.8.0. If we did that with a possible Flink 1.3.3 release and starting from Flink 1.4.0, would this be an option for you?

Best,
Aljoscha

On 11. Jul 2017, at 10:47, Mustafa AKIN <[hidden email]> wrote:

Hi all,

I am trying to use S3 backend with custom endpoint. However, it is not supported in hadoop-aws@2.7.3, I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers: 

Because "fs.s3a.path.style.access" is not recognized in old version.I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes: 

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:


According to, I need hadoop-aws@2.7.2 and 

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)


Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.


Reply | Threaded
Open this post in threaded view
|

Re: Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

Mustafa AKIN
Yes, it would probably work. I cloned master repo and compiled with 2.8.0 and it worked. It would be nice to have 2.8 binaries since Hadoop 2.8.1 is also released 

Mustafa Akın

On Wed, Aug 9, 2017 at 9:00 PM, Eron Wright <[hidden email]> wrote:
For reference: [FLINK-6466] Build Hadoop 2.8.0 convenience binaries

On Wed, Aug 9, 2017 at 6:41 AM, Aljoscha Krettek <[hidden email]> wrote:
So you're saying that this works if you manually compile Flink for Hadoop 2.8.0? If yes, I think the solution is that we have to provide binaries for Hadoop 2.8.0. If we did that with a possible Flink 1.3.3 release and starting from Flink 1.4.0, would this be an option for you?

Best,
Aljoscha

On 11. Jul 2017, at 10:47, Mustafa AKIN <[hidden email]> wrote:

Hi all,

I am trying to use S3 backend with custom endpoint. However, it is not supported in hadoop-aws@2.7.3, I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers: 

Because "fs.s3a.path.style.access" is not recognized in old version.I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes: 

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:


According to, I need hadoop-aws@2.7.2 and 

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)


Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.



Reply | Threaded
Open this post in threaded view
|

Re: Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

Aljoscha Krettek
I created an issue for this: https://issues.apache.org/jira/browse/FLINK-7413

On 10. Aug 2017, at 16:05, Mustafa AKIN <[hidden email]> wrote:

Yes, it would probably work. I cloned master repo and compiled with 2.8.0 and it worked. It would be nice to have 2.8 binaries since Hadoop 2.8.1 is also released 

Mustafa Akın

On Wed, Aug 9, 2017 at 9:00 PM, Eron Wright <[hidden email]> wrote:
For reference: [FLINK-6466] Build Hadoop 2.8.0 convenience binaries

On Wed, Aug 9, 2017 at 6:41 AM, Aljoscha Krettek <[hidden email]> wrote:
So you're saying that this works if you manually compile Flink for Hadoop 2.8.0? If yes, I think the solution is that we have to provide binaries for Hadoop 2.8.0. If we did that with a possible Flink 1.3.3 release and starting from Flink 1.4.0, would this be an option for you?

Best,
Aljoscha

On 11. Jul 2017, at 10:47, Mustafa AKIN <[hidden email]> wrote:

Hi all,

I am trying to use S3 backend with custom endpoint. However, it is not supported in hadoop-aws@2.7.3, I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers: 

Because "fs.s3a.path.style.access" is not recognized in old version.I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes: 

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:


According to, I need hadoop-aws@2.7.2 and 

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)


Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.