Re: CSV sink partitioning and bucketing
Posted by
Fabian Hueske-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/CSV-sink-partitioning-and-bucketing-tp11694p11697.html
Hi Flavio,
Flink does not come with an OutputFormat that creates buckets. It should not be too hard to implement this in Flink though.
However, if you want a solution fast, I would try the following approach:
1) Search for a Hadoop OutputFormat that buckets Strings based on a key (<Key, String>).
2) Implement a mapper that converts Row into a String and extracts the key
3) Use the Hadoop OutputFormat with Flink's HadoopOutputFormat wrapper.
Depending on the output format you might want to partition and sort the data on the key before writing it out.
Best, Fabian