Can not resolve org.apache.hadoop.fs.Path in 1.4.0

Re: Can not resolve org.apache.hadoop.fs.Path in 1.4.0

For now, i have solved this issue by adding the following in filink config :


So it will ignore the duplicate classes from uber jar. I will work on the dependencies. One quick question I am using SBT for the building. Do you have any example sbt file for dependencies? I am bit confused. Should I set all others too "provided", So that won't be included in fat jar.  But as the docs CEP etc. are not part of flink-dist. I haven't checked the content of flink-dist yet but just asking a quick question.

val flinkDependencies = Seq(
  "org.slf4j" % "slf4j-log4j12" % "1.7.21",
  "org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-table" % flinkVersion,
  "org.apache.flink" %% "flink-cep-scala" % flinkVersion,
  "org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion,
  "org.apache.flink" %% "flink-connector-filesystem" % flinkVersion,
  "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion,
  "org.apache.flink" %% "flink-connector-cassandra" % flinkVersion,
  "org.apache.flink" % "flink-shaded-hadoop2" % flinkVersion, 

On Wed, Dec 20, 2017 at 7:48 PM, Timo Walther <[hidden email]> wrote:
Libraries such as CEP or Table API should have the "compile" scope and should be in the both the fat and non-fat jar.

The non-fat jar should contain everything that is not in flink-dist or your lib directory.


Am 12/20/17 um 3:07 PM schrieb shashank agarwal:
Hi,

In that case, it won't find the dependencies. Cause I have other dependencies also and what about CEP etc. cause that is not part of flink-dist. 


On Wed, Dec 20, 2017 at 3:16 PM, Aljoscha Krettek <[hidden email]> wrote:

That jar file looks like it has too much stuff in there that shouldn't be there. This can explain the errors you seeing because of classloading conflicts.

Could you try not building a fat-jar and have only your code in your jar?


On 20. Dec 2017, at 10:15, shashank agarwal <[hidden email]> wrote:

One more thing when i submit the job ir start yarn session it prints following logs :

Using the result of 'hadoop classpath' to augment the Hadoop classpath: /usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

So i think it's adding Hadoop libs in classpath too cause it's able to create the checkpointing directories from flink-conf file to HDFS.

On Wed, Dec 20, 2017 at 2:31 PM, shashank agarwal <[hidden email]> wrote:
Hi,

Please find attached list of jar file contents and flink/lib/ contents. I have removed my class files list from jar list and I have added flink-hadoop-compatibility_2.11-1.4.0.jar later in flink/lib/ but no success. 

I have tried by removing flink-shaded-hadoop2 from my project but still no success.


On Wed, Dec 20, 2017 at 2:14 PM, Aljoscha Krettek <[hidden email]> wrote:

Could you please list what exactly is in your submitted jar file, for example using "jar tf my-jar-file.jar"? And also what files exactly are in your Flink lib directory.


On 19. Dec 2017, at 20:10, shashank agarwal <[hidden email]> wrote:

Hi Timo,

I am using Rocksdbstatebackend with hdfs path. I have following flink dependencies in my sbt :

"org.slf4j" % "slf4j-log4j12" % "1.7.21",
  "org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-cep-scala" % flinkVersion,
  "org.apache.flink" %% "flink-connector-kafka-0.10" % flinkVersion,
  "org.apache.flink" %% "flink-connector-filesystem" % flinkVersion,
  "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion,
  "org.apache.flink" %% "flink-connector-cassandra" % "1.3.2",
  "org.apache.flink" % "flink-shaded-hadoop2" % flinkVersion,

when i start flink yarn session  it's working fine even it's creating flink checkpointing directory and copying libs into hdfs.

But when I submit the application to this yarn session it prints following logs :

Using the result of 'hadoop classpath' to augment the Hadoop classpath: /usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*
But application fails contuniously with logs which i have sent earlier.

‌I have tried to add flink- hadoop-compability*.jar as suggested by Jorn but it's not working.

On Tue, Dec 19, 2017 at 5:08 PM, shashank agarwal <[hidden email]> wrote:
yes, it's working fine. now not getting compile time error.

But when i trying to run this on cluster or yarn, getting following runtime error :

org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
	at org.apache.flink.core.fs.FileSystem.get(
	at org.apache.flink.core.fs.Path.getFileSystem(
	at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory.<init>(
	at org.apache.flink.runtime.state.filesystem.FsStateBackend.createStreamFactory(
	at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createStreamFactory(
	at org.apache.flink.streaming.runtime.tasks.StreamTask.createCheckpointStreamFactory(
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(
	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(
	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop File System abstraction does not support scheme 'hdfs'. Either no file system implementation exists for that scheme, or the relevant classes are missing from the classpath.
	at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
	... 12 more
Caused by: No FileSystem for scheme: hdfs
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
	at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(
	... 13 more




while submitting job it's printing following logs so i think it's including hdoop libs :

Using the result of 'hadoop classpath' to augment the Hadoop classpath: /usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*:/usr/hdp/*

On Fri, Dec 8, 2017 at 9:24 PM, shashank agarwal <[hidden email]> wrote:
Sure i’ll Try that. Thanks

On Fri, 8 Dec 2017 at 9:18 PM, Stephan Ewen <[hidden email]> wrote:
I would recommend to add "flink-shaded-hadoop2". That is a bundle of all Hadoop dependencies used by Flink.

On Fri, Dec 8, 2017 at 3:44 PM, Aljoscha Krettek <[hidden email]> wrote:
I see, thanks for letting us know!

On 8. Dec 2017, at 15:42, shashank agarwal <[hidden email]> wrote:

I had to include two dependencies.

hadoop-hdfs (this for HDFS configuration) 
hadoop-common (this for Path)

On Fri, Dec 8, 2017 at 7:38 PM, Aljoscha Krettek <[hidden email]> wrote:
I think hadoop-hdfs might be sufficient.

On 8. Dec 2017, at 14:48, shashank agarwal <[hidden email]> wrote:

Can you specifically guide which dependencies should I add to extend this :

On Fri, Dec 8, 2017 at 6:58 PM, shashank agarwal <[hidden email]> wrote:
It's a compilation error. I think I have to include the Hadoop dependencies.

On Fri, Dec 8, 2017 at 6:54 PM, Aljoscha Krettek <[hidden email]> wrote:

Is this a compilation error or at runtime. But in general, yes you have to include the Hadoop dependencies if they're not there.


On 8. Dec 2017, at 14:10, shashank agarwal <[hidden email]> wrote:

Hello,

I am trying to test 1.4.0-RC3, Hadoop libraries removed in this version. Actually, i have created custom Bucketer for the bucketing sink.  I am extending 


in the class, i have to use org.apache.hadoop.fs.Path  but as hadoop libraries removed it's giving error 

"object hadoop is not a member of package org.apache"

Should i have to include Hadoop client libs in build.sbt dependencies.


