Hi guys,
I'm using AvroParquetWriter to write parquet files into S3 and when I setup the cluster (starting fresh instances jobmanager/taskmanager etc), the scheduled job starts executing without problems and could write the files into S3 but if the job is canceled and starts again the job throws the exception java.lang.NoClassDefFoundError: org/joda/time/format/DateTimeParserBucket
Environment configuration: - apache flink 1.10 - scala 2.12 - the uber jar is in the application classloader (/lib) flink-shaded-hadoop-2-uber-2.8.3-10.0.jar - in plugins folder exists the folder s3-fs-hadoop with the jar flink-s3-fs-hadoop-1.10.0.jar I can fix this issue adding the dependency joda-time to the flink lib folder and excluding the dependency joda-time from the hadoop-aws that is required by the application code. Do you know what is the root cause of this? Or if I could do another thing than adding the joda-time dependency on the flink lib folder? Thanks cumprimentos, Diogo Santos |
Hi Diogo, thanks for reporting this issue. It looks quite strange to be honest. flink-s3-fs-hadoop-1.10.0.jar contains the DateTimeParserBucket class. So either this class wasn't loaded when starting the application from scratch or there could be a problem with the plugin mechanism on restarts. I'm pulling in Arvid who worked on the plugin mechanism and might be able to tell us more. In the meantime, could you provide us with the logs? They might tell us a bit more what happened. Cheers, Till On Wed, Apr 15, 2020 at 5:54 PM Diogo Santos <[hidden email]> wrote:
|
For future reference, here is the stack trace in an easier to read format: Caused by: java.lang.NoClassDefFoundError: org/joda/time/format/DateTimeParserBucket at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:825 at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtils.java:196 at com.amazonaws.services.s3.internal.ServiceUtils.parseRfc822Date(ServiceUtils.java:88 at com.amazonaws.services.s3.internal.AbstractS3ResponseHandler.populateObjectMetadata(AbstractS3ResponseHandler.java:121 at com.amazonaws.services.s3.internal.S3MetadataResponseHandler.handle(S3MetadataResponseHandler.java:32 at com.amazonaws.services.s3.internal.S3MetadataResponseHandler.handle(S3MetadataResponseHandler.java:25 at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:69 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1714 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse(AmazonHttpClient.java:1434 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1356 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698 at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680 at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544 at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524 at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052 at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998 at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1335 at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1309 at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:904 at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1553 at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:555 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:929 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910 at org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81 at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:246 at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:280 at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:535 undefined) at .... On Thu, Apr 16, 2020 at 9:26 AM Till Rohrmann <[hidden email]> wrote:
|
In reply to this post by Till Rohrmann
Hi Till,
definitely seems to be a strange issue. The first time the job is loaded (with a clean instance of the Cluster) the job goes well, but if it is canceled or started again the issue came. I built an example here https://github.com/congd123/flink-s3-example You can generate the artifact of the Flink Job and start the cluster with the configuration on the docker-compose. Thanks for helping -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hi Diogo, I saw similar issues already. The root cause is always users actually not using any Flink specific stuff, but going to the Parquet Writer of Hadoop directly. As you can see in your stacktrace, there is not one reference to any Flink class. The solution usually is to use the respective Flink sink instead of bypassing them [1]. If you opt to implement it manually nonetheless, it's probably easier to bundle Hadoop from a non-Flink dependency. On Thu, Apr 16, 2020 at 5:36 PM Diogo Santos <[hidden email]> wrote: Hi Till, -- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Free forum by Nabble | Edit this page |