(DEPRECATED) Apache Flink User Mailing List archive.

Re: conditional dataset output

Posted by lars.bachmann on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/conditional-dataset-output-tp10532p10549.html

Hi Chesnay,

I actually thought about the same but like you said it seems a bit hacky
;-). Anyway thank you!

Regards,

Lars

Am 08.12.2016 16:47 schrieb Chesnay Schepler:

> Hello Lars,
>
> The only other way i can think of how this could be done is by wrapping
> the used
> outputformat in a custom format, which calls open on the wrapped
> outputformat
> when you receive the first record.
>
> This should work but is quite hacky though as it interferes with the
> format life-cycle.
>
> Regards,
> Chesnay
>
> On 08.12.2016 16:39, [hidden email] wrote:
>> Hi,
>>
>> let's assume I have a dataset and depending on the input data and
>> different filter operations this dataset can be empty. Now I want to
>> output the dataset to HD, but I want that files are only created if
>> the dataset is not empty. If the dataset is empty I don't want any
>> files. The default way: dataset.write(...) will always create as many
>> files as the parallelism of this operator is configured - in case of
>> an empty dataset all files would be empty as well. I thought about
>> doing something like:
>>
>> if (dataset.count() > 0) {
>> dataset.write(...)
>> }
>>
>> but I don't think thats the way to go, because dataset.count()
>> triggers a execution of the (sub)program.
>>
>> Is there a simple way how to avoid creating empty files for empty
>> datasets?
>>
>> Regards,
>>
>> Lars
>>