Hi to all,
in my use case I'd need to output my Row objects into an output folder as CSV on HDFS but creating/overwriting new subfolders based on an attribute (for example create a subfolder for each value of a specified column). Then, it could be interesting to bucketing the data inside those folders by number of lines,i.e. every file inside those directory cannot contain more than 1000 lines.
For example, if I have a dataset (of Row) containing people I need to write my dataset as CSV into an output folder X partitioned by year (where each file cannot have more then 1000 rows), like:
X/1990/file1
/1990/file2
/1991/file1
etc..
Does something like that exists in Flink?
In principle I could use Hive for this but at the moment I'd try to avoid to add another component to our pipeline...Moreover, my feeling is that very few people is using Flink on Hive..am I wrong?
Any advice on how to proceed?
Best,
Flavio