Hi,
We are using StreamingFileSink with a custom BucketAssigner and DefaultRollingPolicy. The custom BucketAssigner is simply a date bucket assigner. The StreamingFileSink creates part files with name "part-<subtask_number>-<count_of_the_bucket_created_by_that_subtask>". The count is an integer and is incrementing on each rollover. Now my doubts are: 1. When does this count reset to 0 ? 2. Is there a way i can reset this count programmatically ? Since we are using day bucket we would like the count to reset every day. We are using Flink 1.8 Thanks Sidhartha |
Hi Sidhartha, Currently, the part counter is never reset to 0, nor is it allowed to customize the part filename. So I don't think there's any way to reset it right now. I guess the reason why it can't be reset to 0 is that it is concerned that the previous parts will be overwritten. Although the bucket id is part of the part file path, StreamingFileSink does not know when the bucket id will change in the case of custom BucketAssginer. Best, Haibo At 2019-07-30 06:13:54, "sidhartha saurav" <[hidden email]> wrote:
|
Hi Sidhartha, This is a general limitation now because Flink does not keep counters for all buckets but only a global one. Flink assumes that the sink can write to any bucket any time and the counter is not reset to not rewrite the previously written file number 0. Best, Andrey On Tue, Jul 30, 2019 at 7:01 AM Haibo Sun <[hidden email]> wrote:
|
Thank you for the clarification Habibo and Andrey. Is there any limitation after which the global counter will reset ? I mean do we have to worry the counter may get too long and part file crosses the max filename length limit set by OS or is it handled by flink. Sidhartha On Tue, Jul 30, 2019, 10:10 AM Andrey Zagrebin <[hidden email]> wrote:
|
Hi Sidhartha, I don't think you should worry about this. Currently the `StreamingFileSink` uses a long to keep this counter. The maximum of long is 9,223,372,036,854,775,807. The counter would be reset if count of files reaches that value. I don't think it should happen. WRT the max filename length, for example, Linux allows 255 characters for most file systems [1]. It's far more larger than the length of maximum length of long. On Fri, Aug 2, 2019 at 12:24 AM sidhartha saurav <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |