(DEPRECATED) Apache Flink User Mailing List archive.

Error reporting for Flink jobs

Classic

List

Threaded

2 messages Options

Satyam Shekhar

Error reporting for Flink jobs

Hello,

I am using Flink as the query engine for running SQL queries on both batch and streaming data. I use the Blink planner in batch and streaming mode respectively for the two cases.

In my current setup, I execute the batch queries synchronously via StreamTableEnvironment::execute method. The job uses OutputFormat to consume results in StreamTableSink and send it to the user. In case there is an error/exception in the pipeline (possibly to user code), it is not reported to OutputFormat or the Sink. If an error occurs after the invocation of the write method on OutputFormat, the implementation may falsely assume that the result successful and complete since close is called in both success and failure cases. I can work around this, by checking for exceptions thrown by the execute method but that adds extra latency due to job tear down cost.

A similar problem also exists for streaming jobs. In my setup, streaming jobs are executed asynchronously via StreamExecuteEnvironment::executeAsync. Since the sink interface has no methods to receive errors in the pipeline, the user code has to periodically track and manage persistent failures.

Have I missed something in the API? Or Is there some other way to get access to error status in user code?

Regards,

Satyam

Timo Walther

Re: Error reporting for Flink jobs

Hi Satyam,

I'm not aware of an API to solve all your problems at once. A common
pattern for failures in user-code is to catch errors in user-code and
define a side output for an operator to pipe the errors to dedicated
sinks. However, such a functionality does not exist in SQL yet. For the
sink part, it might be useful to look into the StreamingFileSink [1]
which provides better failure handling guarantees. Flink 1.11 will be
shipped with a SQL streaming file sink.

Regards,
Timo

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html

On 28.06.20 12:27, Satyam Shekhar wrote:

> Hello,
>
> I am using Flink as the query engine for running SQL queries on both
> batch and streaming data. I use the Blink planner in batch and streaming
> mode respectively for the two cases.
>
> In my current setup, I execute the batch queries synchronously via
> StreamTableEnvironment::execute method. The job uses OutputFormat to
> consume results in StreamTableSink and send it to the user. In case
> there is an error/exception in the pipeline (possibly to user code), it
> is not reported to OutputFormat or the Sink. If an error occurs after
> the invocation of the write method on OutputFormat, the implementation
> may falsely assume that the result successful and complete since close
> is called in both success and failure cases. I can work around this, by
> checking for exceptions thrown by the execute method but that adds extra
> latency due to job tear down cost.
>
> A similar problem also exists for streaming jobs. In my setup, streaming
> jobs are executed asynchronously via
> StreamExecuteEnvironment::executeAsync. Since the sink interface has no
> methods to receive errors in the pipeline, the user code has to
> periodically track and manage persistent failures.
>
> Have I missed something in the API? Or Is there some other way to get
> access to error status in user code?
>
> Regards,
> Satyam