(DEPRECATED) Apache Flink User Mailing List archive.

Throwing Recoverable Exceptions from Tasks

Classic

List

Threaded

3 messages Options

chiggi_dev

Throwing Recoverable Exceptions from Tasks

Hi,

I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function.

My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I understand correctly, should be avoided.

Another way is to use the Job Manager's exceptions REST API to pull the alerts from the Flink cluster. This however requires me to throw an exception from the operator which results in a job failure/restart.

Is there any way I can throw exceptions that appear in the REST API but don't restart the job? This might be similar to how some of the internal exceptions like checkpointing related appear in the REST output.

I plan on exploring the Queryable state for such use cases as well, but the evolving state might be a blocker.

Thanks

Chirag

Chesnay Schepler

Re: Throwing Recoverable Exceptions from Tasks

There is no way to have an exception appear in the REST API without restarting the job; that field is exactly defined as the exception causing the job to fail.

Using asynchronous by itself is fine, so long as you don't wait for any confirmation. In any case you could remedy the issue by writing the alerts into a side-output and having a dedicated sink for submitting these alerts.

On 12/28/2020 5:24 AM, Chirag Dewan wrote:

Hi,

I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function.

My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I understand correctly, should be avoided.

Another way is to use the Job Manager's exceptions REST API to pull the alerts from the Flink cluster. This however requires me to throw an exception from the operator which results in a job failure/restart.

Is there any way I can throw exceptions that appear in the REST API but don't restart the job? This might be similar to how some of the internal exceptions like checkpointing related appear in the REST output.

I plan on exploring the Queryable state for such use cases as well, but the evolving state might be a blocker.

Thanks

Chirag

Arvid Heise-3

Re: Throwing Recoverable Exceptions from Tasks

A typical solution to your issue is to use an ELK stack to collect the logs and define some filters on log events.

If it's specific to input data issues, I also found side-outputs useful to store invalid data points. Then, you can simply monitor the side topic (assuming Kafka) and already have the data points at hand to improve your software/investigate the root cause.

On Mon, Dec 28, 2020 at 12:14 PM Chesnay Schepler <[hidden email]> wrote:

There is no way to have an exception appear in the REST API without restarting the job; that field is exactly defined as the exception causing the job to fail.

Using asynchronous by itself is fine, so long as you don't wait for any confirmation. In any case you could remedy the issue by writing the alerts into a side-output and having a dedicated sink for submitting these alerts.

On 12/28/2020 5:24 AM, Chirag Dewan wrote:

Hi,

I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function.

My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I understand correctly, should be avoided.

Another way is to use the Job Manager's exceptions REST API to pull the alerts from the Flink cluster. This however requires me to throw an exception from the operator which results in a job failure/restart.

Is there any way I can throw exceptions that appear in the REST API but don't restart the job? This might be similar to how some of the internal exceptions like checkpointing related appear in the REST output.

I plan on exploring the Queryable state for such use cases as well, but the evolving state might be a blocker.

Thanks

Chirag

Arvid Heise | Senior Java Developer

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng