NoClassDefFoundError javax.xml.bind.DatatypeConverterImpl

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

NoClassDefFoundError javax.xml.bind.DatatypeConverterImpl

Mike Mintz
Hi Flink developers,

We're running some new DataStream jobs on Flink 1.7.0 using the shaded Hadoop S3 file system, and running into frequent errors saving checkpoints and savepoints to S3. I'm not sure what the underlying reason for the error is, but we often fail with the following stack trace, which appears to be due to missing the javax.xml.bind.DatatypeConverterImpl class in an error-handling path for AmazonS3Client.

java.lang.NoClassDefFoundError: Could not initialize class javax.xml.bind.DatatypeConverterImpl
    at javax.xml.bind.DatatypeConverter.initConverter(DatatypeConverter.java:140)
    at javax.xml.bind.DatatypeConverter.printBase64Binary(DatatypeConverter.java:611)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.Base64.encodeAsString(Base64.java:62)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.Md5Utils.md5AsBase64(Md5Utils.java:104)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1647)
    at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1531)


For reference, we're running flink from the "Apache 1.7.0 Flink only Scala 2.11" binary tgz, we've copied flink-s3-fs-hadoop-1.7.0.jar from opt/ to lib/, we're not defining HADOOP_CLASSPATH, and we're running java 8 (openjdk version "1.8.0_191") on Ubuntu 18.04 x86_64.

Presumably there are two issues: 1) some periodic error with S3, and 2) some classpath / class loading issue with javax.xml.bind.DatatypeConverterImpl that's preventing the original error from being displayed. I'm more curious about the later issue.

This is super puzzling since javax/xml/bind/DatatypeConverterImpl.class is included in our rt.jar, and lsof confirms we're reading that rt.jar, so I suspect it's something tricky with custom class loaders or the way the shaded S3 jar works. Note that this class is not included in flink-s3-fs-hadoop-1.7.0.jar (which we are using), but it is included in flink-shaded-hadoop2-uber-1.7.0.jar (which we are not using).

Another thing that jumped out to us was that Flink 1.7 is now able to build JDK9, but Java 9 includes deprecation of the javax.xml.bind libraries, requiring explicit inclusion in a Java 9 module [0]. And we saw that direct references to javax.xml.bind were removed from flink-core for 1.7 [1]

Some things we tried, without success:
  • Building flink from source on a computer with java 8 installed. We still got NoClassDefFoundError.
  • Using the binary version of Flink on machines with java 9 installed. We get NullPointerException in ClosureCleaner.
  • Downloading the jaxb-api jar [2], which has javax/xml/bind/DatatypeConverterImpl.class, and setting HADOOP_CLASSPATH to have that jar. We still got NoClassDefFoundError.
  • Using iptables to completely block S3 traffic, hoping this would make it easier to reproduce. The connection errors are properly displayed, so these connection errors must go down another error handling path.
Would love to hear any ideas about what might be happening, or further ideas we can try.

Thanks!
Mike



Reply | Threaded
Open this post in threaded view
|

Re: NoClassDefFoundError javax.xml.bind.DatatypeConverterImpl

Chesnay Schepler
Small correction: Flink 1.7 does not support jdk9; we only fixed some of the issues, not all of them.

On 06.12.2018 07:13, Mike Mintz wrote:
Hi Flink developers,

We're running some new DataStream jobs on Flink 1.7.0 using the shaded Hadoop S3 file system, and running into frequent errors saving checkpoints and savepoints to S3. I'm not sure what the underlying reason for the error is, but we often fail with the following stack trace, which appears to be due to missing the javax.xml.bind.DatatypeConverterImpl class in an error-handling path for AmazonS3Client.

java.lang.NoClassDefFoundError: Could not initialize class javax.xml.bind.DatatypeConverterImpl
    at javax.xml.bind.DatatypeConverter.initConverter(DatatypeConverter.java:140)
    at javax.xml.bind.DatatypeConverter.printBase64Binary(DatatypeConverter.java:611)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.Base64.encodeAsString(Base64.java:62)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.Md5Utils.md5AsBase64(Md5Utils.java:104)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1647)
    at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1531)


For reference, we're running flink from the "Apache 1.7.0 Flink only Scala 2.11" binary tgz, we've copied flink-s3-fs-hadoop-1.7.0.jar from opt/ to lib/, we're not defining HADOOP_CLASSPATH, and we're running java 8 (openjdk version "1.8.0_191") on Ubuntu 18.04 x86_64.

Presumably there are two issues: 1) some periodic error with S3, and 2) some classpath / class loading issue with javax.xml.bind.DatatypeConverterImpl that's preventing the original error from being displayed. I'm more curious about the later issue.

This is super puzzling since javax/xml/bind/DatatypeConverterImpl.class is included in our rt.jar, and lsof confirms we're reading that rt.jar, so I suspect it's something tricky with custom class loaders or the way the shaded S3 jar works. Note that this class is not included in flink-s3-fs-hadoop-1.7.0.jar (which we are using), but it is included in flink-shaded-hadoop2-uber-1.7.0.jar (which we are not using).

Another thing that jumped out to us was that Flink 1.7 is now able to build JDK9, but Java 9 includes deprecation of the javax.xml.bind libraries, requiring explicit inclusion in a Java 9 module [0]. And we saw that direct references to javax.xml.bind were removed from flink-core for 1.7 [1]

Some things we tried, without success:
  • Building flink from source on a computer with java 8 installed. We still got NoClassDefFoundError.
  • Using the binary version of Flink on machines with java 9 installed. We get NullPointerException in ClosureCleaner.
  • Downloading the jaxb-api jar [2], which has javax/xml/bind/DatatypeConverterImpl.class, and setting HADOOP_CLASSPATH to have that jar. We still got NoClassDefFoundError.
  • Using iptables to completely block S3 traffic, hoping this would make it easier to reproduce. The connection errors are properly displayed, so these connection errors must go down another error handling path.
Would love to hear any ideas about what might be happening, or further ideas we can try.

Thanks!
Mike




Reply | Threaded
Open this post in threaded view
|

Re: NoClassDefFoundError javax.xml.bind.DatatypeConverterImpl

Mike Mintz
For what it's worth, we believe are able to work around this issue by adding the following line to our flink-conf.yaml:

classloader.parent-first-patterns.additional: javax.xml.;org.apache.xerces.


On Thu, Dec 6, 2018 at 2:28 AM Chesnay Schepler <[hidden email]> wrote:
Small correction: Flink 1.7 does not support jdk9; we only fixed some of the issues, not all of them.

On 06.12.2018 07:13, Mike Mintz wrote:
Hi Flink developers,

We're running some new DataStream jobs on Flink 1.7.0 using the shaded Hadoop S3 file system, and running into frequent errors saving checkpoints and savepoints to S3. I'm not sure what the underlying reason for the error is, but we often fail with the following stack trace, which appears to be due to missing the javax.xml.bind.DatatypeConverterImpl class in an error-handling path for AmazonS3Client.

java.lang.NoClassDefFoundError: Could not initialize class javax.xml.bind.DatatypeConverterImpl
    at javax.xml.bind.DatatypeConverter.initConverter(DatatypeConverter.java:140)
    at javax.xml.bind.DatatypeConverter.printBase64Binary(DatatypeConverter.java:611)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.Base64.encodeAsString(Base64.java:62)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.Md5Utils.md5AsBase64(Md5Utils.java:104)
    at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1647)
    at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1531)


For reference, we're running flink from the "Apache 1.7.0 Flink only Scala 2.11" binary tgz, we've copied flink-s3-fs-hadoop-1.7.0.jar from opt/ to lib/, we're not defining HADOOP_CLASSPATH, and we're running java 8 (openjdk version "1.8.0_191") on Ubuntu 18.04 x86_64.

Presumably there are two issues: 1) some periodic error with S3, and 2) some classpath / class loading issue with javax.xml.bind.DatatypeConverterImpl that's preventing the original error from being displayed. I'm more curious about the later issue.

This is super puzzling since javax/xml/bind/DatatypeConverterImpl.class is included in our rt.jar, and lsof confirms we're reading that rt.jar, so I suspect it's something tricky with custom class loaders or the way the shaded S3 jar works. Note that this class is not included in flink-s3-fs-hadoop-1.7.0.jar (which we are using), but it is included in flink-shaded-hadoop2-uber-1.7.0.jar (which we are not using).

Another thing that jumped out to us was that Flink 1.7 is now able to build JDK9, but Java 9 includes deprecation of the javax.xml.bind libraries, requiring explicit inclusion in a Java 9 module [0]. And we saw that direct references to javax.xml.bind were removed from flink-core for 1.7 [1]

Some things we tried, without success:
  • Building flink from source on a computer with java 8 installed. We still got NoClassDefFoundError.
  • Using the binary version of Flink on machines with java 9 installed. We get NullPointerException in ClosureCleaner.
  • Downloading the jaxb-api jar [2], which has javax/xml/bind/DatatypeConverterImpl.class, and setting HADOOP_CLASSPATH to have that jar. We still got NoClassDefFoundError.
  • Using iptables to completely block S3 traffic, hoping this would make it easier to reproduce. The connection errors are properly displayed, so these connection errors must go down another error handling path.
Would love to hear any ideas about what might be happening, or further ideas we can try.

Thanks!
Mike