Help required - "BucketingSink" usage to write HDFS Files

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Help required - "BucketingSink" usage to write HDFS Files

Raja.Aravapalli

 

Hi,

 

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.

 

 

Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?

 

 

BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);

 

 

Thanks a lot.

 

 

Regards,

Raja.

Reply | Threaded
Open this post in threaded view
|

Re: Help required - "BucketingSink" usage to write HDFS Files

Vinay Patil
Hi Raja,

Have you enabled checkpointing?

The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time.


Regards,
Vinay Patil

On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

 

Hi,

 

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.

 

 

Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?

 

 

BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);

 

 

Thanks a lot.

 

 

Regards,

Raja.




To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Raja.Aravapalli

Hi Vinay,

 

Thanks for the response.

 

I have NOT enabled any checkpointing.

 

Files are rolling out correctly for every 2mb, but the files are remaining as below:

 

-rw-r--r--   3 2097424 2017-08-06 21:10 /xxxx/xxxx/xxxx/Test/part-0-0.pending

-rw-r--r--   3 1431430 2017-08-06 21:12 /xxxx/xxxx/xxxx/Test/part-0-1.pending

 

 

Regards,

Raja.

 

From: vinay patil <[hidden email]>
Date: Sunday, August 6, 2017 at 10:40 PM
To: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

 

Hi Raja,

Have you enabled checkpointing?

The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time.


Regards,

Vinay Patil

 

On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

 

Hi,

 

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.

 

 

Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?

 

 

BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);

 

 

Thanks a lot.

 

 

Regards,

Raja.

 


To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

 

 


View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Vinay Patil
Hi Raja,

That is why they are in the pending state. You can enable checkpointing by setting env.enableCheckpointing(<duration>)

After doing this they will not remain in pending state.


Regards,
Vinay Patil

On Mon, Aug 7, 2017 at 9:15 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi Vinay,

 

Thanks for the response.

 

I have NOT enabled any checkpointing.

 

Files are rolling out correctly for every 2mb, but the files are remaining as below:

 

-rw-r--r--   3 2097424 2017-08-06 21:10 /xxxx/xxxx/xxxx/Test/part-0-0.pending

-rw-r--r--   3 1431430 2017-08-06 21:12 /xxxx/xxxx/xxxx/Test/part-0-1.pending

 

 

Regards,

Raja.

 

From: vinay patil <[hidden email]>
Date: Sunday, August 6, 2017 at 10:40 PM
To: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

 

Hi Raja,

Have you enabled checkpointing?

The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time.


Regards,

Vinay Patil

 

On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

 

Hi,

 

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.

 

 

Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?

 

 

BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);

 

 

Thanks a lot.

 

 

Regards,

Raja.

 


To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

 

 


View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.




To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Raja.Aravapalli

Thanks very much for the pointers Vinay. That helps

 

 

-Raja.

 

From: vinay patil <[hidden email]>
Date: Monday, August 7, 2017 at 1:56 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

 

Hi Raja,

That is why they are in the pending state. You can enable checkpointing by setting env.enableCheckpointing(<duration>)

 

After doing this they will not remain in pending state.

 


Regards,

Vinay Patil

 

On Mon, Aug 7, 2017 at 9:15 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi Vinay,

 

Thanks for the response.

 

I have NOT enabled any checkpointing.

 

Files are rolling out correctly for every 2mb, but the files are remaining as below:

 

-rw-r--r--   3 2097424 2017-08-06 21:10 /xxxx/xxxx/xxxx/Test/part-0-0.pending

-rw-r--r--   3 1431430 2017-08-06 21:12 /xxxx/xxxx/xxxx/Test/part-0-1.pending

 

 

Regards,

Raja.

 

From: vinay patil <[hidden email]>
Date: Sunday, August 6, 2017 at 10:40 PM
To: "[hidden email]" <[hidden email]>
Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

 

Hi Raja,

Have you enabled checkpointing?

The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time.


Regards,

Vinay Patil

 

On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

 

Hi,

 

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.

 

 

Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?

 

 

BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);

 

 

Thanks a lot.

 

 

Regards,

Raja.

 


To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

 

 


View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

 


To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML

 

 


View this message in context: Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.