(DEPRECATED) Apache Flink User Mailing List archive.

Discrepancy between the part length file's length and the part file length during recover

Classic

List

Threaded

5 messages Options

Vishal Santoshi

Discrepancy between the part length file's length and the part file length during recover

Hello folks,

I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange

hadoop fs -cat hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length

1765887805

hadoop fs -ls hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0

I see the valid-length file reporting a larger length then the part file itself.

Any clue why would that be the case ?

Regards.

Vishal Santoshi

Re: Discrepancy between the part length file's length and the part file length during recover

This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length.

This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ?

Either am wondering whether the recover ( the one that does a truncate ) need to account for the length in the length file or the length reported by the FileStatus ?

On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:

Hello folks,
I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange

hadoop fs -cat hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length

1765887805

hadoop fs -ls hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0

I see the valid-length file reporting a larger length then the part file itself.

Any clue why would that be the case ?

Regards.

Paul Lam

Re: Discrepancy between the part length file's length and the part file length during recover

Hi Vishal,

I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly.

I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.

I’ve just filed a ticket [1], please take a look.

[1] https://issues.apache.org/jira/browse/FLINK-12022

Best,

Paul Lam

在 2019年3月12日，09:24，Vishal Santoshi <[hidden email]> 写道：

This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length.
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ?

Either am wondering whether the recover ( the one that does a truncate ) need to account for the length in the length file or the length reported by the FileStatus ?

On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange

hadoop fs -cat <a href="hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length" class="">hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length

1765887805

hadoop fs -ls <a href="hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0" class="">hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 <a href="hdfs://**********/dt=2019-03-07/part-9-0" class="">hdfs://**********/dt=2019-03-07/part-9-0

I see the valid-length file reporting a larger length then the part file itself.

Any clue why would that be the case ?

Regards.

Vishal Santoshi

Re: Discrepancy between the part length file's length and the part file length during recover

Thank you for your email.

Would then the assumption that this possibility ( part reported length > part file size ( reported by FileStatus on NN) ) is only attributable to this edge case be correct ?

Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus on NN recovers ?

On Tue, Mar 26, 2019 at 9:10 AM Paul Lam <[hidden email]> wrote:

Hi Vishal,

I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly.
I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.

I’ve just filed a ticket [1], please take a look.

[1] https://issues.apache.org/jira/browse/FLINK-12022

Best,
Paul Lam

在 2019年3月12日，09:24，Vishal Santoshi <[hidden email]> 写道：

This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length.
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ?

Either am wondering whether the recover ( the one that does a truncate ) need to account for the length in the length file or the length reported by the FileStatus ?

On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange

hadoop fs -cat hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length

1765887805

hadoop fs -ls hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0

I see the valid-length file reporting a larger length then the part file itself.

Any clue why would that be the case ?

Regards.

Paul Lam

Re: Discrepancy between the part length file's length and the part file length during recover

Hi,

Would then the assumption that this possibility ( part reported length > part file size ( reported by FileStatus on NN) ) is only attributable to this edge case be correct ?

Yes, I think so.

Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus on NN recovers ?

Actually, most of the time the file needs truncation and I’ve set up a cronjob to do this.

Best,

Paul Lam

在 2019年3月26日，21:26，Vishal Santoshi <[hidden email]> 写道：

Thank you for your email.

Would then the assumption that this possibility ( part reported length > part file size ( reported by FileStatus on NN) ) is only attributable to this edge case be correct ?
Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus on NN recovers ?

On Tue, Mar 26, 2019 at 9:10 AM Paul Lam <[hidden email]> wrote:
Hi Vishal,

I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly.
I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.

I’ve just filed a ticket [1], please take a look.

[1] https://issues.apache.org/jira/browse/FLINK-12022

Best,
Paul Lam

在 2019年3月12日，09:24，Vishal Santoshi <[hidden email]> 写道：

This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length.
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ?

Either am wondering whether the recover ( the one that does a truncate ) need to account for the length in the length file or the length reported by the FileStatus ?

On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange

hadoop fs -cat hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length

1765887805

hadoop fs -ls hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0

I see the valid-length file reporting a larger length then the part file itself.

Any clue why would that be the case ?

Regards.