Hello folks,
I have flink 1.7.2 working with hadoop 2.6 and b'coz there is no in build truncate ( in hadoop 2.6 ) I am writing a method to cleanup ( truncate ) part files based on the length in the valid-length files dropped by flink during restore. I see some thing very strange hadoop fs -cat hdfs://n*********/*******/dt=2019-03-07/_part-9-0.valid-length 1765887805 hadoop fs -ls hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-03-07/part-9-0
-rw-r--r-- 3 root hadoop 1280845815 2019-03-07 16:00 hdfs://**********/dt=2019-03-07/part-9-0 I see the valid-length file reporting a larger length then the part file itself. Any clue why would that be the case ? Regards. |
This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. This seems to be true for all these type of discrepancies. It might be that the block information did not get updated ? Either am wondering whether the recover ( the one that does a truncate ) need to account for the length in the length file or the length reported by the FileStatus ? On Thu, Mar 7, 2019 at 5:00 PM Vishal Santoshi <[hidden email]> wrote:
|
Hi Vishal,
I’ve come across the same problem. The problem is that by default the file length is not updated when the output stream is not closed properly. I modified the writer to update file lengths on each flush, but it comes with some overhead, so this approach should be used when strong consistency is required. I’ve just filed a ticket [1], please take a look. Best, Paul Lam
|
Thank you for your email. Would then the assumption that this possibility ( part reported length > part file size ( reported by FileStatus on NN) ) is only attributable to this edge case be correct ? Or do you see a case where in though the above is true, the part file would need truncation as and when FileStatus on NN recovers ? On Tue, Mar 26, 2019 at 9:10 AM Paul Lam <[hidden email]> wrote:
|
Hi,
Yes, I think so.
Actually, most of the time the file needs truncation and I’ve set up a cronjob to do this. Best, Paul Lam
|
Free forum by Nabble | Edit this page |