(DEPRECATED) Apache Flink User Mailing List archive.

Dockerised Flink 1.8 with Hadoop S3 FS support

Classic

List

Threaded

2 messages Options

Lorenzo Nicora

Jul 02, 2020; 10:05am

Dockerised Flink 1.8 with Hadoop S3 FS support

I need to set up a dockerized session cluster using Flink 1.8.2 for development and troubleshooting. We are bound to 1.8.2 as we are deploying to AWS Kinesis Data Analytics for Flink.

I am using an image based on the semi-official flink:1.8-scala_2.11

I need to add to my dockerized cluster support for S3 Hadoop File System (s3a://), we have on KDA out of the box.

Note I do not want to add dependencies to the job directly, as I want to deploy locally exactly the same JAR I deploy to KDA.

Flink 1.8 docs [1] say is supported out of the box but does not look to be the case for dockerised version.

I am getting "Could not find a file system implementation for scheme 's3a'" and "Hadoop is not in the classpath/dependencies".

I assume I need to create a customised docker image, extending flink:1.8-scala_2.11, but I do not understand how to add support for S3 Hadoop FS.

Can someone please point me in the right direction? Docs or examples?

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/filesystems.html

Lorenzo

Yang Wang

Jul 03, 2020; 8:28am

Re: Dockerised Flink 1.8 with Hadoop S3 FS support

Hi Lorenzo,

Since Flink 1.8 does not support plugin mechanism to load filesystem, you need to copy flink-s3-fs-hadoop-*.jar

from opt to lib directory.

The dockerfile could be like following.

FROM flink:1.8-scala_2.11
RUN cp /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/lib

Then build you docker image and start the session cluster again.

Best,

Yang

Lorenzo Nicora <[hidden email]> 于2020年7月2日周四下午6:05写道：

Hi

I need to set up a dockerized session cluster using Flink 1.8.2 for development and troubleshooting. We are bound to 1.8.2 as we are deploying to AWS Kinesis Data Analytics for Flink.

I am using an image based on the semi-official flink:1.8-scala_2.11
I need to add to my dockerized cluster support for S3 Hadoop File System (s3a://), we have on KDA out of the box.

Note I do not want to add dependencies to the job directly, as I want to deploy locally exactly the same JAR I deploy to KDA.

Flink 1.8 docs [1] say is supported out of the box but does not look to be the case for dockerised version.
I am getting "Could not find a file system implementation for scheme 's3a'" and "Hadoop is not in the classpath/dependencies".
I assume I need to create a customised docker image, extending flink:1.8-scala_2.11, but I do not understand how to add support for S3 Hadoop FS.

Can someone please point me in the right direction? Docs or examples?

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/filesystems.html

Lorenzo

... [show rest of quote]