In case you experience an exception similar to the following:
org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=53562; expectedLength=65536; includeSkipped=true; in.getClass()=class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client$2; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0 at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:151) at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:93) at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:76) at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.closeStream(S3AInputStream.java:529) at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.close(S3AInputStream.java:490) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.flink.fs.s3.common.hadoop.HadoopDataInputStream.close(HadoopDataInputStream.java:89) at org.apache.flink.api.common.io.FileInputFormat.close(FileInputFormat.java:861) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:206) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) The root cause is a bug in Hadoop's S3A filesystem implementation: https://issues.apache.org/jira/browse/HADOOP-16767 A temporary hacky workaround is to replace S3AInputStream class and all the classes that it requires and use it in a custom filesystem implementation. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Thanks a lot for reporting this!
I believe that this can be really useful for the community! Cheers, Kostas On Tue, Dec 17, 2019 at 1:29 PM spoganshev <[hidden email]> wrote: > > In case you experience an exception similar to the following: > > org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Data > read has a different length than the expected: dataLength=53562; > expectedLength=65536; includeSkipped=true; in.getClass()=class > org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client$2; > markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; > resetCount=0 > at > org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:151) > at > org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:93) > at > org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:76) > at > org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.closeStream(S3AInputStream.java:529) > at > org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.close(S3AInputStream.java:490) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.flink.fs.s3.common.hadoop.HadoopDataInputStream.close(HadoopDataInputStream.java:89) > at > org.apache.flink.api.common.io.FileInputFormat.close(FileInputFormat.java:861) > at > org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:206) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > > > The root cause is a bug in Hadoop's S3A filesystem implementation: > https://issues.apache.org/jira/browse/HADOOP-16767 > > A temporary hacky workaround is to replace S3AInputStream class and all the > classes that it requires and use it in a custom filesystem implementation. > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |