We want to change the name of the file being generated as the output of our StreamFileSink.
, when files are generated they are named part-00*, is there a way that we can change the name. In Hadoop, we can change RecordWriters and MultipleOutputs. May I please some help in this regard. This is causing blockers for us and will force us t move to MR job Thank you and regards, Dhurandar |
Hi Dhurandar: Currently StreamingFileSink should be able to change the prefix and suffix of the filename[1], it could be changed to something like <prefix>-0-0<suffix>. Could this solve your problem ? Best, Yun [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration ------------------------------------------------------------------ |
Yes we looked at it , The problem is the file name gets generated in a dynamic fashion, based on which organization data we are getting we generate the file name from the coming data. Is there any way we can achieve this ?? On Tue, May 12, 2020 at 8:38 PM Yun Gao <[hidden email]> wrote:
Thank you and regards, Dhurandar |
Hi, Dhurandar, Best, Jingsong Lee On Thu, May 14, 2020 at 2:05 AM dhurandar S <[hidden email]> wrote:
Best, Jingsong Lee |
Hi Just shooting away my thoughts. Based on your what you had described so far, I think your objective is to have some unique way to identify/filter the output based on the organization. If that's the case, you can implement a BucketAssigner with the logic to create a bucket key based on the organization data. Cheers, Sivaprasanna On Thu, May 14, 2020 at 12:13 PM Jingsong Li <[hidden email]> wrote:
|
In reply to this post by Jingsong Li
Hi Jingsong, We have a system where organizations keep getting added and removed on a regular basis, As the new organizations get added the data from these organization starts flowing into the streaming system, we do group by on Organisation ID which is part of the incoming event, If in the incoming stream we find any new Organisation Ids that we have not seen before then we create a new file and start writing data into it. But this is dynamic as in based on the incoming stream. regards, Rahul On Wed, May 13, 2020 at 11:43 PM Jingsong Li <[hidden email]> wrote:
Thank you and regards, Dhurandar |
Hi Rahul, Thanks for explaining. I see. Now there is no way to dynamic control file name in StreamingFileSink. If the number of organizations is not so huge. Like Sivaprasanna said, you can use "BucketAssigner" to create bucket by your organization ID. The bucket in StreamingFileSink is like Hive/Spark's partition, the information is in directory name. Each organization creates a new directory. Best, Jingsong Lee On Tue, May 19, 2020 at 2:03 AM dhurandar S <[hidden email]> wrote:
Best, Jingsong Lee |
Free forum by Nabble | Edit this page |