NoClassDefFoundError for jersey-core on YARN

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

NoClassDefFoundError for jersey-core on YARN

Juho Autio
I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho


My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@


I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first


Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'

Reply | Threaded
Open this post in threaded view
|

Re: NoClassDefFoundError for jersey-core on YARN

Gary Yao-2
Hi Juho,

Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
For example:
  HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run [...]

Best,
Gary



On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <[hidden email]> wrote:
I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho


My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@


I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first


Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'


Reply | Threaded
Open this post in threaded view
|

Re: NoClassDefFoundError for jersey-core on YARN

Juho Autio
Thank you. The YARN job was started now, but the Flink job itself is in some bad state.

Flink UI keeps showing status CREATED for all sub-tasks and nothing seems to be happening.

( For the record, this is what I did: export HADOOP_CLASSPATH=`hadoop classpath` – as found at https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html )

I found this in Job manager log:

2018-03-28 15:26:17,449 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 20, slots allocated: 8
at org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$36(ExecutionGraph.java:984)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:551)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:789)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

After this there was:

2018-03-28 15:26:17,521 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting the job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93).

And some time after that:

2018-03-28 15:27:39,125 ERROR org.apache.flink.runtime.blob.BlobServerConnection            - GET operation failed
java.io.EOFException: Premature end of GET request
at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:275)
at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117)

Task manager logs don't have any errors.

Is that error about BlobServerConnection severe enough to make the job get stuck like this? How to debug this further?

Thanks! 

On Wed, Mar 28, 2018 at 5:56 PM, Gary Yao <[hidden email]> wrote:
Hi Juho,

Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
For example:
  HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run [...]

Best,
Gary



On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <[hidden email]> wrote:
I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho


My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@


I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first


Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'



Reply | Threaded
Open this post in threaded view
|

Re: NoClassDefFoundError for jersey-core on YARN

Juho Autio
Never mind, I'll post this new problem as a new thread.

On Wed, Mar 28, 2018 at 6:35 PM, Juho Autio <[hidden email]> wrote:
Thank you. The YARN job was started now, but the Flink job itself is in some bad state.

Flink UI keeps showing status CREATED for all sub-tasks and nothing seems to be happening.

( For the record, this is what I did: export HADOOP_CLASSPATH=`hadoop classpath` – as found at https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html )

I found this in Job manager log:

2018-03-28 15:26:17,449 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 20, slots allocated: 8
at org.apache.flink.runtime.executiongraph.ExecutionGraph.lambda$scheduleEager$36(ExecutionGraph.java:984)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:551)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:789)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)

After this there was:

2018-03-28 15:26:17,521 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Restarting the job UniqueIdStream (43ed4ace55974d3c486452a45ee5db93).

And some time after that:

2018-03-28 15:27:39,125 ERROR org.apache.flink.runtime.blob.BlobServerConnection            - GET operation failed
java.io.EOFException: Premature end of GET request
at org.apache.flink.runtime.blob.BlobServerConnection.get(BlobServerConnection.java:275)
at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:117)

Task manager logs don't have any errors.

Is that error about BlobServerConnection severe enough to make the job get stuck like this? How to debug this further?

Thanks! 

On Wed, Mar 28, 2018 at 5:56 PM, Gary Yao <[hidden email]> wrote:
Hi Juho,

Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
For example:
  HADOOP_CLASSPATH=`hadoop classpath` link-${FLINK_VERSION}/bin/flink run [...]

Best,
Gary



On Wed, Mar 28, 2018 at 4:26 PM, Juho Autio <[hidden email]> wrote:
I built a new Flink distribution from release-1.5 branch today.

I tried running a job but get this error:
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

I use yarn-cluster mode.

The jersey-core jar is found in the hadoop lib on my EMR cluster, but seems like it's not used any more.

I checked that jersey-core classes are not included in the new distribution, but they were not included in my previously built flink 1.5-SNAPSHOT either, which works. Has something changed recently to cause this?

Is this a Flink bug or should I fix this by somehow explicitly telling Flink YARN app to use the hadoop lib now?

More details below if needed.

Thanks,
Juho


My launch command is basically:

flink-${FLINK_VERSION}/bin/flink run -m yarn-cluster -yn ${NODE_COUNT} -ys ${SLOT_COUNT} -yjm ${JOB_MANAGER_MEMORY} -ytm ${TASK_MANAGER_MEMORY} -yst -yD restart-strategy=fixed-delay -yD restart-strategy.fixed-delay.attempts=3 -yD "restart-strategy.fixed-delay.delay=30 s" -p ${PARALLELISM} $@


I'm also setting this to fix some classloading error (with the previous build that still works)
-yD.classloader.resolve-order=parent-first


Error stack trace:

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.getClusterDescriptor(FlinkYarnSessionCli.java:971)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createDescriptor(FlinkYarnSessionCli.java:273)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:449)
at org.apache.flink.yarn.cli.FlinkYarnSessionCli.createClusterDescriptor(FlinkYarnSessionCli.java:92)
at org.apache.fliCommand exiting with ret '31'