SAX2 driver class org.apache.xerces.parsers.SAXParser not found

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SAX2 driver class org.apache.xerces.parsers.SAXParser not found

Averell
This post was updated on .
Hello,

I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well
as writing output to S3a using StreamingFileSink. The job runs well until I
add the Java Hadoop properties:  -Dfs.s3a.acl.default=
BucketOwnerFullControl
. Since after that, the checkpoint process fails to
complete.

Caused by: org.xml.sax.SAXException: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found

I tried to add a jar file with that class
(https://mvnrepository.com/artifact/xerces/xercesImpl/2.12.0) to my
flink/lib/ directory, then got the same error but different stacktrace:
Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found


This seems to be a dependencies conflict, but I couldn't track its root.
In my IDE I didn't have any dependencies issue, while I couldn't find
SAXParser in the dependencies tree.

Could you please help with finding the cause?

Thanks!

Here is the stacktrace when the jar file is not there:
Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
s3a://mybucket/checkpoint/a9502b1c81ced10dfcbb21ac43f03e61/chk-2/41f51c24-60fd-474b-9f89-3d65d87037c7:
com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create
an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
        at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
        at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
        at
org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
        ... 17 more
Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX
driver to create an XMLReader
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
        at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
        at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
        at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
        ... 29 more
Caused by: org.xml.sax.SAXException: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found
java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
        ... 52 more


*And here is the stacktrace when that jar file added to /lib/ folder*

Could not materialize checkpoint 1 for operator Source:
<my_operators_chain> (1/2).
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException:
Could not open output stream for state backend
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461)
        at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
        ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: Could not open output
stream for state backend
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at
org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:78)
        at
org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:33)
        at
org.apache.flink.runtime.state.PartitionableListState.write(PartitionableListState.java:116)
        at
org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:155)
        at
org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108)
        at
org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
        ... 5 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: getFileStatus on
s3a://mybucket/checkpoint/d8ed6d1524169c942bbc455d2c519a39/chk-1/7f2d8fd6-4f3f-4da7-9ffd-5a7e3ea8e7e3:
com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create
an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
        at
org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
        ... 17 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: Couldn't initialize a
SAX driver to create an XMLReader
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
        at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
        ... 29 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
        ... 52 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable:
org.apache.xerces.parsers.SAXParser
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at
org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82)
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228)
        ... 54 common frames omitted

Reply | Threaded
Open this post in threaded view
|

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

rmetzger0
Hi,
I guess you've loaded the S3 filesystem using the s3 FS plugin.

You need to put the right jar file containing the SAX2 driver class into the plugin directory where you've also put the S3 filesystem plugin.
You can probably find out the name of the right sax2 jar file from your local setup where everything is working.

I hope that helps!

Best,
Robert 

On Thu, Aug 27, 2020 at 1:38 PM Averell <[hidden email]> wrote:
Hello,

I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well
as writing output to S3a using StreamingFileSink. The job runs well until I
add the Java Hadoop properties:  /-Dfs.s3a.acl.default=
BucketOwnerFullControl/. Since after that, the checkpoint process fails to
complete.

/Caused by: org.xml.sax.SAXException: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found/
I tried to add a jar file with that class
(https://mvnrepository.com/artifact/xerces/xercesImpl/2.12.0) to my
flink/lib/ directory, then got the same error but different stacktrace:
/Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found/

This seems to be a dependencies conflict, but I couldn't track its root.
In my IDE I didn't have any dependencies issue, while I couldn't find
SAXParser in the dependencies tree.

*Here is the stacktrace when the jar file is not there:*
/Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
s3a://mybucket/checkpoint/a9502b1c81ced10dfcbb21ac43f03e61/chk-2/41f51c24-60fd-474b-9f89-3d65d87037c7:
com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create
an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
        at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
        at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
        at
org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
        ... 17 more
Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX
driver to create an XMLReader
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
        at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
        at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
        at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
        ... 29 more
Caused by: org.xml.sax.SAXException: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found
java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
        ... 52 more/

*And here is the stacktrace when that jar file added to /lib/ folder*

/Could not materialize checkpoint 1 for operator Source:
<my_operators_chain> (1/2).
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException:
Could not open output stream for state backend
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461)
        at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
        ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: Could not open output
stream for state backend
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at
org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:78)
        at
org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:33)
        at
org.apache.flink.runtime.state.PartitionableListState.write(PartitionableListState.java:116)
        at
org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:155)
        at
org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108)
        at
org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
        ... 5 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: getFileStatus on
s3a://mybucket/checkpoint/d8ed6d1524169c942bbc455d2c519a39/chk-1/7f2d8fd6-4f3f-4da7-9ffd-5a7e3ea8e7e3:
com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create
an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
        at
org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
        ... 17 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: Couldn't initialize a
SAX driver to create an XMLReader
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
        at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
        ... 29 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
        ... 52 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable:
org.apache.xerces.parsers.SAXParser
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at
org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82)
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228)
        ... 54 common frames omitted
/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

Arvid Heise-3
Hi Averell,

This is a known bug [1] caused by the used AWS S3 library not respecting the classloader [2].

The best solution is to upgrade to 1.10.1 (or take the s3-hadoop jar from 1.10.1). Don't try to put Xerces manually anywhere.


On Thu, Aug 27, 2020 at 4:34 PM Robert Metzger <[hidden email]> wrote:
Hi,
I guess you've loaded the S3 filesystem using the s3 FS plugin.

You need to put the right jar file containing the SAX2 driver class into the plugin directory where you've also put the S3 filesystem plugin.
You can probably find out the name of the right sax2 jar file from your local setup where everything is working.

I hope that helps!

Best,
Robert 

On Thu, Aug 27, 2020 at 1:38 PM Averell <[hidden email]> wrote:
Hello,

I have a Flink 1.10 job which runs in AWS EMR, checkpointing to S3a as well
as writing output to S3a using StreamingFileSink. The job runs well until I
add the Java Hadoop properties:  /-Dfs.s3a.acl.default=
BucketOwnerFullControl/. Since after that, the checkpoint process fails to
complete.

/Caused by: org.xml.sax.SAXException: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found/
I tried to add a jar file with that class
(https://mvnrepository.com/artifact/xerces/xercesImpl/2.12.0) to my
flink/lib/ directory, then got the same error but different stacktrace:
/Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found/

This seems to be a dependencies conflict, but I couldn't track its root.
In my IDE I didn't have any dependencies issue, while I couldn't find
SAXParser in the dependencies tree.

*Here is the stacktrace when the jar file is not there:*
/Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
s3a://mybucket/checkpoint/a9502b1c81ced10dfcbb21ac43f03e61/chk-2/41f51c24-60fd-474b-9f89-3d65d87037c7:
com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create
an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
        at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
        at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
        at
org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
        ... 17 more
Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX
driver to create an XMLReader
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
        at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
        at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
        at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
        ... 29 more
Caused by: org.xml.sax.SAXException: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found
java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
        ... 52 more/

*And here is the stacktrace when that jar file added to /lib/ folder*

/Could not materialize checkpoint 1 for operator Source:
<my_operators_chain> (1/2).
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:1238)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1180)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException:
Could not open output stream for state backend
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:461)
        at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:53)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
        ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: Could not open output
stream for state backend
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at
org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:78)
        at
org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.serialize(BytePrimitiveArraySerializer.java:33)
        at
org.apache.flink.runtime.state.PartitionableListState.write(PartitionableListState.java:116)
        at
org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:155)
        at
org.apache.flink.runtime.state.DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackendSnapshotStrategy.java:108)
        at
org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:75)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
        ... 5 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: getFileStatus on
s3a://mybucket/checkpoint/d8ed6d1524169c942bbc455d2c519a39/chk-1/7f2d8fd6-4f3f-4da7-9ffd-5a7e3ea8e7e3:
com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create
an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:749)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:141)
        at
org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.create(HadoopFileSystem.java:37)
        at
org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.create(PluginFileSystemFactory.java:164)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:126)
        at
org.apache.flink.core.fs.EntropyInjector.createEntropyAware(EntropyInjector.java:61)
        at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:356)
        ... 17 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: Couldn't initialize a
SAX driver to create an XMLReader
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
        at
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
        at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
        at
com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
        at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
        ... 29 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: SAX2 driver class
org.apache.xerces.parsers.SAXParser not found
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
        at
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
        at
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
        ... 52 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable:
org.apache.xerces.parsers.SAXParser
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at
org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82)
        at
org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228)
        ... 54 common frames omitted
/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

rmetzger0
Hi Averell,
as far as I know these tmp files should be removed when the Flink job is recovering. So you should have these files around only for the latest incomplete checkpoint while recovery has not completed yet.

On Tue, Sep 1, 2020 at 2:56 AM Averell <[hidden email]> wrote:
Hello Robert, Arvid,

As I am running on EMR, and currently AWS only supports version 1.10.
I tried both solutions that you suggested ((i) copying a SAXParser
implementation to the plugins folder and (ii) using the S3FS Plugin from
1.10.1), and both worked - I could have successful checkpoints.

However, intermittenly my checkpoints still fail (about 10%). And whenever
it fails, there are non-completed files left in S3 (attached screenshot
below).
I'm not sure whether those uncompleted files are expected, or is that a bug?

Thanks and regards,
Averell
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/Screen_Shot_2020-08-28_at_11.png>



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: SAX2 driver class org.apache.xerces.parsers.SAXParser not found

Averell
Hello Robert,

I'm not sure why the screenshot I attached in the previous post was not
shown. I'm trying to re-attach in this post.
As shown in this screenshot, part-1-33, part-1-34, and part-1-35 have
already been closed, but the temp file for part-1-33 is still there.

Thanks and regards
Averell
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/FlinkFileSink.png>



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/