S3A "Data read has a different length than the expected" issue root cause

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

S3A "Data read has a different length than the expected" issue root cause

spoganshev
In case you experience an exception similar to the following:

org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Data
read has a different length than the expected: dataLength=53562;
expectedLength=65536; includeSkipped=true; in.getClass()=class
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client$2;
markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0;
resetCount=0
        at
org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:151)
        at
org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:93)
        at
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:76)
        at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.closeStream(S3AInputStream.java:529)
        at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.close(S3AInputStream.java:490)
        at java.io.FilterInputStream.close(FilterInputStream.java:181)
        at
org.apache.flink.fs.s3.common.hadoop.HadoopDataInputStream.close(HadoopDataInputStream.java:89)
        at
org.apache.flink.api.common.io.FileInputFormat.close(FileInputFormat.java:861)
        at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:206)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
        at java.lang.Thread.run(Thread.java:748)


The root cause is a bug in Hadoop's S3A filesystem implementation:
https://issues.apache.org/jira/browse/HADOOP-16767

A temporary hacky workaround is to replace S3AInputStream class and all the
classes that it requires and use it in a custom filesystem implementation.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: S3A "Data read has a different length than the expected" issue root cause

Kostas Kloudas-2
Thanks a lot for reporting this!

I believe that this can be really useful for the community!

Cheers,
Kostas

On Tue, Dec 17, 2019 at 1:29 PM spoganshev <[hidden email]> wrote:

>
> In case you experience an exception similar to the following:
>
> org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Data
> read has a different length than the expected: dataLength=53562;
> expectedLength=65536; includeSkipped=true; in.getClass()=class
> org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client$2;
> markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0;
> resetCount=0
>         at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:151)
>         at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:93)
>         at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:76)
>         at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.closeStream(S3AInputStream.java:529)
>         at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.close(S3AInputStream.java:490)
>         at java.io.FilterInputStream.close(FilterInputStream.java:181)
>         at
> org.apache.flink.fs.s3.common.hadoop.HadoopDataInputStream.close(HadoopDataInputStream.java:89)
>         at
> org.apache.flink.api.common.io.FileInputFormat.close(FileInputFormat.java:861)
>         at
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:206)
>         at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> The root cause is a bug in Hadoop's S3A filesystem implementation:
> https://issues.apache.org/jira/browse/HADOOP-16767
>
> A temporary hacky workaround is to replace S3AInputStream class and all the
> classes that it requires and use it in a custom filesystem implementation.
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/