Discrepancy between the part length file's length and the part file length during recover

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Discrepancy between the part length file's length and the part file length during recover

Vishal Santoshi
Hello folks,
                 I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 )  I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange 

hadoop fs -cat  hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length


1765887805





 hadoop fs -ls  hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r--   3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0


 I see the  valid-length  file reporting a larger length then the part file itself.

Any clue why would that be the case ? 

Regards.


Reply | Threaded
Open this post in threaded view
|

Re: Discrepancy between the part length file's length and the part file length during recover

Vishal Santoshi
This seems strange.  When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. 
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ? 

Either am wondering whether the recover ( the one that does a truncate )  need to account for the length in the length file or the length reported by the FileStatus ? 


On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
                 I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 )  I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange 

hadoop fs -cat  hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length


1765887805





 hadoop fs -ls  hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r--   3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0


 I see the  valid-length  file reporting a larger length then the part file itself.

Any clue why would that be the case ? 

Regards.


Reply | Threaded
Open this post in threaded view
|

Re: Discrepancy between the part length file's length and the part file length during recover

Paul Lam
Hi Vishal,

I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly. 
I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.

I’ve just filed a ticket [1], please take a look.


Best,
Paul Lam

在 2019年3月12日,09:24,Vishal Santoshi <[hidden email]> 写道:

This seems strange.  When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. 
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ? 

Either am wondering whether the recover ( the one that does a truncate )  need to account for the length in the length file or the length reported by the FileStatus ? 


On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
                 I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 )  I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange 

hadoop fs -cat  <a href="hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length" class="">hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length

1765887805




 hadoop fs -ls  <a href="hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0" class="">hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0

-rw-r--r--   3 root hadoop 1280845815 2019-03-07 16:00 <a href="hdfs://**********/dt=2019-03-07/part-9-0" class="">hdfs://**********/dt=2019-03-07/part-9-0

 I see the  valid-length  file reporting a larger length then the part file itself.

Any clue why would that be the case ? 

Regards.



Reply | Threaded
Open this post in threaded view
|

Re: Discrepancy between the part length file's length and the part file length during recover

Vishal Santoshi
Thank you for your email. 

Would then the assumption that this possibility ( part reported length >  part file size ( reported by FileStatus  on NN)  ) is only attributable to this edge case be correct ? 
Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus  on NN recovers ? 



On Tue, Mar 26, 2019 at 9:10 AM Paul Lam <[hidden email]> wrote:
Hi Vishal,

I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly. 
I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.

I’ve just filed a ticket [1], please take a look.


Best,
Paul Lam

在 2019年3月12日,09:24,Vishal Santoshi <[hidden email]> 写道:

This seems strange.  When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. 
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ? 

Either am wondering whether the recover ( the one that does a truncate )  need to account for the length in the length file or the length reported by the FileStatus ? 


On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
                 I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 )  I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange 


 I see the  valid-length  file reporting a larger length then the part file itself.

Any clue why would that be the case ? 

Regards.



Reply | Threaded
Open this post in threaded view
|

Re: Discrepancy between the part length file's length and the part file length during recover

Paul Lam
Hi,

Would then the assumption that this possibility ( part reported length >  part file size ( reported by FileStatus  on NN)  ) is only attributable to this edge case be correct ? 

Yes, I think so.

Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus  on NN recovers ? 

Actually, most of the time the file needs truncation and I’ve set up a cronjob to do this.

Best,
Paul Lam

在 2019年3月26日,21:26,Vishal Santoshi <[hidden email]> 写道:

Thank you for your email. 

Would then the assumption that this possibility ( part reported length >  part file size ( reported by FileStatus  on NN)  ) is only attributable to this edge case be correct ? 
Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus  on NN recovers ? 



On Tue, Mar 26, 2019 at 9:10 AM Paul Lam <[hidden email]> wrote:
Hi Vishal,

I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly. 
I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required.

I’ve just filed a ticket [1], please take a look.


Best,
Paul Lam

在 2019年3月12日,09:24,Vishal Santoshi <[hidden email]> 写道:

This seems strange.  When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. 
This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ? 

Either am wondering whether the recover ( the one that does a truncate )  need to account for the length in the length file or the length reported by the FileStatus ? 


On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
Hello folks,
                 I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 )  I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange 


 I see the  valid-length  file reporting a larger length then the part file itself.

Any clue why would that be the case ? 

Regards.