Re: Is outputting from components other than sinks or side outputs a no-no ?

Posted by David Anderson-3 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Is-outputting-from-components-other-than-sinks-or-side-outputs-a-no-no-tp36943p36944.html

Every job is required to have a sink, but there's no requirement that all output be done via sinks. It's not uncommon, and doesn't have to cause problems, to have other operators that do I/O.

What can be problematic, however, is doing blocking I/O. While your user function is blocked, the function will exert back pressure, and checkpoint barriers will be unable to make any progress. This sometimes leads to checkpoint timeouts and job failures. So it's recommended to make any I/O you do asynchronous, using an AsyncFunction [1] or something similar.

Note that the asynchronous i/o function stores the records for in-flight asynchronous requests in checkpoints, and restores/re-triggers the requests when recovering from a failure. This might lead to duplicate results if you are using it to do non-idempotent database writes. If you need transactions, use a sink that offers them.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html

Best,
David

On Sun, Jul 26, 2020 at 11:08 AM Tom Fennelly <[hidden email]> wrote:
Hi.

What are the negative side effects of (for example) a filter function occasionally making a call out to a DB ? Is this a big no-no and should all outputs be done through sinks and side outputs, no exceptions ?

Regards,

Tom.