(DEPRECATED) Apache Flink User Mailing List archive.

BucketingSink capabilities for DataSet API

Posted by Rafi Aroch on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/BucketingSink-capabilities-for-DataSet-API-tp24107.html

Hi,

I'm writing a Batch job which reads Parquet, does some aggregations and writes back as Parquet files.

I would like the output to be partitioned by year, month, day by event time. Similarly to the functionality of the BucketingSink.

I was able to achieve the reading/writing to/from Parquet by using the hadoop-compatibility features.

I couldn't find a way to partition the data by year, month, day to create a folder hierarchy accordingly. Everything is written to a single directory.

I could find an unanswered question about this issue: https://stackoverflow.com/questions/52204034/apache-flink-does-dataset-api-support-writing-output-to-individual-file-partit

Can anyone suggest a way to achieve this? Maybe there's a way to integrate the BucketingSink with the DataSet API? Another solution?

Rafi