Hi, I have written a small program that uses a Twitter input stream and a HDFS output sink. When the files are written to HDFS each part file in the directory has a .pending extension. I am able to cat the file and see the tweet text. Is this normal for the part files to have .pending extension. -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-95.pending -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-96.pending -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-97.pending -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-98.pending -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-99.pending |
BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to set any other property to move the files from .pending state. BucketingSink<String> sink = new BucketingSink<String>("hdfs://localhost:8020/flinktwitter/"); sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone <[hidden email]> wrote:
Hi, I have written a small program that uses a Twitter input stream and a HDFS output sink. When the files are written to HDFS each part file in the directory has a .pending extension. I am able to cat the file and see the tweet text. Is this normal for the part files to have .pending extension. -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-95.pending -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-96.pending -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-97.pending -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-98.pending -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-99.pending |
Hi,
you need to enable checkpointing for your job. Flink uses ".pending" extensions to mark parts that have been completely written, but are not included in a checkpoint yet. Once you enable checkpointing, the .pending extensions will be removed whenever a checkpoint completes. Regards, Urs On 02.09.2017 02:46, Krishnanand Khambadkone wrote: > BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to set any other property to move the files from .pending state. > BucketingSink<String> sink = new BucketingSink<String>("hdfs://localhost:8020/flinktwitter/");sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm")); > On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone <[hidden email]> wrote: > > This message is eligible for Automatic Cleanup! ([hidden email]) Add cleanup rule | More info > Hi, I have written a small program that uses a Twitter input stream and a HDFS output sink. When the files are written to HDFS each part file in the directory has a .pending extension. I am able to cat the file and see the tweet text. Is this normal for the part files to have .pending extension. > > -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-95.pending > > -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-96.pending > > -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-97.pending > > -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-98.pending > > -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-99.pending > > > > BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to > set any other property to move the files from .pending state. > > BucketingSink<String> sink = new > BucketingSink<String>("hdfs://localhost:8020/flinktwitter/"); > sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm")); > > On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone > <[hidden email]> wrote: > > > Boxbe <https://www.boxbe.com/overview> This message is eligible for > Automatic Cleanup! ([hidden email]) Add cleanup rule > <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DEtlbVGf2IoFyqVd%252BYTQgoYh7IBe%252BIpOJYK7qDVCFAc0%253D%26token%3Dvrvb4I8bZMqQO%252BIQo4LNdIPzxul4NPZ3oJxE1mxcxH%252Bl4O3xClWrPt9haYNIyocLTiCZU9Hz03W2YAj7r%252BrvypJRDvZuV2DQKZIO0jWxjDDidXcdSYtJf6vQSofw8eMWiaV6575VpAnd8HTL3AsZgQ%253D%253D&tc_serial=32491392088&tc_rand=158279498&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> > | More info > <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=32491392088&tc_rand=158279498&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> > > Hi, I have written a small program that uses a Twitter input stream and > a HDFS output sink. When the files are written to HDFS each part file > in the directory has a .pending extension. I am able to cat the file > and see the tweet text. Is this normal for the part files to have > .pending extension. > > -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-95.pending > > -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-96.pending > > -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-97.pending > > -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-98.pending > > -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-99.pending > > -- Urs Schönenberger - [hidden email] TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
In reply to this post by Krishnanand Khambadkone
Yes, I enabled checkpointing and now the files do not have .pending extension. Thank you Urs.
On Saturday, September 2, 2017, 3:10:28 AM PDT, Urs Schoenenberger <[hidden email]> wrote:
you need to enable checkpointing for your job. Flink uses ".pending" extensions to mark parts that have been completely written, but are not included in a checkpoint yet. Once you enable checkpointing, the .pending extensions will be removed whenever a checkpoint completes. Regards, Urs On 02.09.2017 02:46, Krishnanand Khambadkone wrote: > BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to set any other property to move the files from .pending state. > BucketingSink<String> sink = new BucketingSink<String>("hdfs://localhost:8020/flinktwitter/");sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm")); > On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone <[hidden email]> wrote: > > This message is eligible for Automatic Cleanup! ([hidden email]) Add cleanup rule | More info > Hi, I have written a small program that uses a Twitter input stream and a HDFS output sink. When the files are written to HDFS each part file in the directory has a .pending extension. I am able to cat the file and see the tweet text. Is this normal for the part files to have .pending extension. > > -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-95.pending > > -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-96.pending > > -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-97.pending > > -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-98.pending > > -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35 /flinktwitter/2017-09-01--1635/_part-0-99.pending > > > > BTW, I am using a BucketingSink and a DateTimeBucketer. Do I need to > set any other property to move the files from .pending state. > > BucketingSink<String> sink = new > BucketingSink<String>("hdfs://localhost:8020/flinktwitter/"); > sink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm")); > > On Friday, September 1, 2017, 5:03:46 PM PDT, Krishnanand Khambadkone > <[hidden email]> wrote: > > > Boxbe <https://www.boxbe.com/overview> This message is eligible for > Automatic Cleanup! ([hidden email]) Add cleanup rule > <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DEtlbVGf2IoFyqVd%252BYTQgoYh7IBe%252BIpOJYK7qDVCFAc0%253D%26token%3Dvrvb4I8bZMqQO%252BIQo4LNdIPzxul4NPZ3oJxE1mxcxH%252Bl4O3xClWrPt9haYNIyocLTiCZU9Hz03W2YAj7r%252BrvypJRDvZuV2DQKZIO0jWxjDDidXcdSYtJf6vQSofw8eMWiaV6575VpAnd8HTL3AsZgQ%253D%253D&tc_serial=32491392088&tc_rand=158279498&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> > | More info > <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=32491392088&tc_rand=158279498&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> > > Hi, I have written a small program that uses a Twitter input stream and > a HDFS output sink. When the files are written to HDFS each part file > in the directory has a .pending extension. I am able to cat the file > and see the tweet text. Is this normal for the part files to have > .pending extension. > > -rw-r--r-- 3 user supergroup 46399 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-95.pending > > -rw-r--r-- 3 user supergroup 54861 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-96.pending > > -rw-r--r-- 3 user supergroup 41878 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-97.pending > > -rw-r--r-- 3 user supergroup 42813 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-98.pending > > -rw-r--r-- 3 user supergroup 42887 2017-09-01 16:35 > /flinktwitter/2017-09-01--1635/_part-0-99.pending > > -- Urs Schönenberger - [hidden email] TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Müller Sitz: Unterföhring * Amtsgericht München * HRB 135082 |
Free forum by Nabble | Edit this page |