Hey Paulo! I think it's not possible out of the box at the moment, but
you can try the following as a work around:
1) Create a custom OutputFormat that extends TextOutputFormat and
override the clean up method:
public class NoCleanupTextOutputFormat<T> extends TextOutputFormat<T> {
@Override
public void tryCleanupOnError() {
// ignore cleanup on error
}
}
2) writeAsFormattedText is actually a map + writeAsText (if you look
into DataSet.java). Instead of that you should manually do:
dataSet.map(new FormattingMapper<>(clean(formatter))).output(new
NoCleanupTextOutputFormat(..))
This should work as expected. You can furthermore open an issue with a
feature request to allow configuring Flink's TextOutputFormat to
ignore cleanup.
Best,
Ufuk
On Tue, Sep 27, 2016 at 10:42 PM, Paulo Cezar <
[hidden email]> wrote:
> Hi Folks,
>
> I was wondering if it's possible to keep partial outputs from dataset
> programs.
> I have a batch pipeline that writes its output on HDFS using
> writeAsFormattedText. When it fails the output file is deleted but I would
> like to keep it so that I can generate new inputs for the pipeline to avoid
> reprocessing.
>
> []'s
> Paulo Cezar