Hello guys,
I have to write my batch data (Dataset<Row>) to a file format. Actually what I need to do is:
- split the data if it exceeds some size threshold (by line count or max MB)
- compress the output data (possibly without converting to the hadoop format)
Are there any suggestions / recommendations about that?
Best,
Flavio