Hadoop FS when running standalone

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hadoop FS when running standalone

Lorenzo Nicora
Hi

I need to run my streaming job as a standalone Java application, for testing
The job uses the Hadoop S3 FS and I need to test it (not a unit test).

The job works fine when deployed (I am using AWS Kinesis Data Analytics, so Flink 1.8.2)

I have org.apache.flink:flink-s3-fs-hadoop as a "compile" dependency.

For running standalone, I have a Maven profile adding dependencies that are normally provided (
org.apache.flink:flink-java, org.apache.flink:flink-streaming-java_2.11, org.apache.flink:flink-statebackend-rocksdb_2.11, org.apache.flink:flink-connector-filesystem_2.11) but I keep getting the error "Hadoop is not in the classpath/dependencies" and it does not work.
I tried adding org.apache.flink:flink-hadoop-fs with no luck

What dependencies am I missing?

Cheers
Lorenzo
Reply | Threaded
Open this post in threaded view
|

Re: Hadoop FS when running standalone

Alessandro Solimando
Hi Lorenzo,
IIRC I had the same error message when trying to write snappified parquet on HDFS with a standalone fat jar. 

Flink could not "find" the hadoop native/binary libraries (specifically I think for me the issue was related to snappy), because my HADOOP_HOME was not (properly) set.

I have never used S3 so I don't know if what I mentioned could be the problem here too, but worth checking.

Best regards,
Alessandro

On Thu, 16 Jul 2020 at 12:59, Lorenzo Nicora <[hidden email]> wrote:
Hi

I need to run my streaming job as a standalone Java application, for testing
The job uses the Hadoop S3 FS and I need to test it (not a unit test).

The job works fine when deployed (I am using AWS Kinesis Data Analytics, so Flink 1.8.2)

I have org.apache.flink:flink-s3-fs-hadoop as a "compile" dependency.

For running standalone, I have a Maven profile adding dependencies that are normally provided (
org.apache.flink:flink-java, org.apache.flink:flink-streaming-java_2.11, org.apache.flink:flink-statebackend-rocksdb_2.11, org.apache.flink:flink-connector-filesystem_2.11) but I keep getting the error "Hadoop is not in the classpath/dependencies" and it does not work.
I tried adding org.apache.flink:flink-hadoop-fs with no luck

What dependencies am I missing?

Cheers
Lorenzo
Reply | Threaded
Open this post in threaded view
|

Re: Hadoop FS when running standalone

Lorenzo Nicora
Thanks Alessandro,

I think I solved it. 
I cannot set any HADOOP_HOME as I have no Hadoop installed on the machine running my tests. 
But adding org.apache.flink:flink-shaded-hadoop-2:2.8.3-10.0 as a compile dependency to the Maven profile building the standalone version fixed the issue.

Lorenzo


On Thu, 16 Jul 2020 at 15:35, Alessandro Solimando <[hidden email]> wrote:
Hi Lorenzo,
IIRC I had the same error message when trying to write snappified parquet on HDFS with a standalone fat jar. 

Flink could not "find" the hadoop native/binary libraries (specifically I think for me the issue was related to snappy), because my HADOOP_HOME was not (properly) set.

I have never used S3 so I don't know if what I mentioned could be the problem here too, but worth checking.

Best regards,
Alessandro

On Thu, 16 Jul 2020 at 12:59, Lorenzo Nicora <[hidden email]> wrote:
Hi

I need to run my streaming job as a standalone Java application, for testing
The job uses the Hadoop S3 FS and I need to test it (not a unit test).

The job works fine when deployed (I am using AWS Kinesis Data Analytics, so Flink 1.8.2)

I have org.apache.flink:flink-s3-fs-hadoop as a "compile" dependency.

For running standalone, I have a Maven profile adding dependencies that are normally provided (
org.apache.flink:flink-java, org.apache.flink:flink-streaming-java_2.11, org.apache.flink:flink-statebackend-rocksdb_2.11, org.apache.flink:flink-connector-filesystem_2.11) but I keep getting the error "Hadoop is not in the classpath/dependencies" and it does not work.
I tried adding org.apache.flink:flink-hadoop-fs with no luck

What dependencies am I missing?

Cheers
Lorenzo