(DEPRECATED) Apache Flink User Mailing List archive.

NoClassDefFoundError for jersey-core on YARN

Classic

List

Threaded

4 messages Options

Juho Autio

NoClassDefFoundError for jersey-core on YARN

I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,

Juho

My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@

I'm also setting this to fix some classloading error (with the previous build that still works)

-yD.classloader.resolve-order=parent-first

Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:763)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)

at java.net.URLClassLoader.access$100(URLClassLoader.java:73)

at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

at java.net.URLClassLoader$1.run(URLClassLoader.java:362)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:361)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)

at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)

at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)

at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)

at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)

at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)

at org.apache.fliCommand exiting with ret '31'

Gary Yao-2

Re: NoClassDefFoundError for jersey-core on YARN

Hi Juho,

Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]

For example:

HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run [...]

Best,

Gary

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths

On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <[hidden email]> wrote:

I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho

My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@

I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first

Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'

Juho Autio

Re: NoClassDefFoundError for jersey-core on YARN

Thank you. The YARN job was started now, but the Flink job itself is in some bad state.

Flink UI keeps showing status CREATED for all sub-tasks and nothing seems to be happening.

( For the record, this is what I did: export HADOOP_CLASSPATH=`hadoop classpath` – as found at https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html )

I found this in Job manager log:

2018-03-28 15:26:17,449 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93) switched from state RUNNING to FAILING.

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 20, slots allocated: 8

at org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$36(ExecutionGraph.java:984)

at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)

at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)

at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)

at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)

at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:551)

at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)

at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)

at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)

at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)

at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:789)

at akka.dispatch.OnComplete.internal(Future.scala:258)

at akka.dispatch.OnComplete.internal(Future.scala:256)

at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)

at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)

at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)

at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)

at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)

at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)

at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)

at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)

at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)

at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)

at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)

at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)

at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)

at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)

at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)

at java.lang.Thread.run(Thread.java:748)

After this there was:

2018-03-28 15:26:17,521 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Restarting the job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93).

And some time after that:

2018-03-28 15:27:39,125 ERROR org.apache.flink.runtime.blob.BlobServerConnection - GET operation failed

java.io.EOFException: Premature end of GET request

at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:275)

at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117)

Task manager logs don't have any errors.

Is that error about BlobServerConnection severe enough to make the job get stuck like this? How to debug this further?

Thanks!

On Wed, Mar 28, 2018 at 5:56 PM, Gary Yao <[hidden email]> wrote:

Hi Juho,

Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
For example:
HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run [...]

Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths

On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <[hidden email]> wrote:
I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho

My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@

I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first

Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'

Juho Autio

Re: NoClassDefFoundError for jersey-core on YARN

Never mind, I'll post this new problem as a new thread.

On Wed, Mar 28, 2018 at 6:35 PM, Juho Autio <[hidden email]> wrote:

Thank you. The YARN job was started now, but the Flink job itself is in some bad state.

Flink UI keeps showing status CREATED for all sub-tasks and nothing seems to be happening.

( For the record, this is what I did: export HADOOP_CLASSPATH=`hadoop classpath` – as found at https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html )

I found this in Job manager log:

2018-03-28 15:26:17,449 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 20, slots allocated: 8
at org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$36(ExecutionGraph.java:984)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:551)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:789)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

After this there was:

2018-03-28 15:26:17,521 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Restarting the job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93).

And some time after that:

2018-03-28 15:27:39,125 ERROR org.apache.flink.runtime.blob.BlobServerConnection - GET operation failed
java.io.EOFException: Premature end of GET request
at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:275)
at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117)

Task manager logs don't have any errors.

Is that error about BlobServerConnection severe enough to make the job get stuck like this? How to debug this further?

Thanks!

On Wed, Mar 28, 2018 at 5:56 PM, Gary Yao <[hidden email]> wrote:
Hi Juho,

Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
For example:
HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run [...]

Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpaths

On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <[hidden email]> wrote:
I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho

My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@

I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first

Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'