Not sure why, when I submit the job at the first time after a cluster launch,
it is working fine. After I cancelled the first job, then resubmit the same job again, it will hit the NoClassDefFoundError. Very weird, feels like some clean up of a cancelled job messed up future job of the same classes. Anyone got the same issue? java.lang.NoClassDefFoundError: com.xxx.yyy.zzz$Builder at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethods(Class.java:1975) at org.apache.flink.api.java.typeutils.TypeExtractionUtils.getAllDeclaredMethods(TypeExtractionUtils.java:243) at org.apache.flink.api.java.typeutils.TypeExtractor.analyzePojo(TypeExtractor.java:1949) at org.apache.flink.api.java.typeutils.AvroTypeInfo.generateFieldsFromAvroSchema(AvroTypeInfo.java:55) at org.apache.flink.api.java.typeutils.AvroTypeInfo.<init>(AvroTypeInfo.java:48) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1810) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1716) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:953) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:814) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:768) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:764) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.createSubclassSerializer(PojoSerializer.java:1129) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.getSubclassSerializer(PojoSerializer.java:1122) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:253) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:526) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:309) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:408) at org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher.emitRecordAndUpdateState(KinesisDataFetcher.java:486) at org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.deserializeRecordForCollectionAndUpdateState(ShardConsumer.java:263) at org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.xxx.yyy.zzz$Builder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 31 more -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Yes, we've seen this issue as well, though it usually takes many more
resubmits before the error pops up. Interestingly, of the 7 jobs we run (all of which use different Avro schemas), we only see this issue on 1 of them. Once the NoClassDefFoundError crops up though, it is necessary to recreate the task managers. There's a whole page on the Flink documentation on debugging classloading, and Avro is mentioned several times on that page: https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/debugging_classloading.html It seems that (in 1.3 at least) each submitted job has its own classloader, and its own instance of the Avro class definitions. However, the Avro class cache will keep references to the Avro classes from classloaders for the previous cancelled jobs. That said, we haven't been able to find a solution to this error yet. Flink 1.4 would be worth a try because the of the changes to the default classloading behaviour (child-first is the new default, not parent-first). -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi!
We changed a few things between 1.3 and 1.4 concerning Avro. One of the main things is that Avro is no longer part of the core Flink class library, but needs to be packaged into your application jar file. The class loading / caching issues of 1.3 with respect to Avro should disappear in Flink 1.4, because Avro classes and caches are scoped to the job classloaders, so the caches do not go across different jobs, or even different operators. Please check: Make sure you have Avro as a dependency in your jar file (in scope "compile"). Hope that solves the issue. Stephan On Mon, Jan 22, 2018 at 2:31 PM, Edward <[hidden email]> wrote: Yes, we've seen this issue as well, though it usually takes many more |
I am currently suffering through similar issues. Had a job running happily, but when it the cluster tried to restarted it would not find the JSON serializer in it. The job kept trying to restart in a loop. Just today I was running a job I built locally. The job ran fine. I added two commits and rebuilt the jar. Now the job dies when it tries to start claiming it can't find the time assigner class. I've unzipped the job jar, both locally and in the TM blob directory and have confirmed the class is in it. This is the backtrace: java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73) at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) at java.io.ObjectInputStream.readClassDesc(Unknown Source) at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167) at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89) at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Unknown Source) On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <[hidden email]> wrote:
|
Something seems to be off with the user code class loader. The only way I can get my job to start is if I drop the job into the lib folder in the JM and configure the JM's classloader.resolve-order to parent-first. Suggestions? On Thu, Feb 22, 2018 at 12:52 PM, Elias Levy <[hidden email]> wrote:
|
@Elias This is a know issue that will be fixed in 1.4.2 which we will do very quickly just because of this bug: https://issues.apache.org/jira/browse/FLINK-8741.
|
Free forum by Nabble | Edit this page |