Throwing Recoverable Exceptions from Tasks

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Throwing Recoverable Exceptions from Tasks

chiggi_dev
Hi,

I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function. 

My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I understand correctly, should be avoided.

Another way is to use the Job Manager's exceptions REST API to pull the alerts from the Flink cluster. This however requires me to throw an exception from the operator which results in a job failure/restart.

Is there any way I can throw exceptions that appear in the REST API but don't restart the job? This might be similar to how some of the internal exceptions like checkpointing related appear in the REST output.

I plan on exploring the Queryable state for such use cases as well, but the evolving state might be a blocker.

Thanks
Chirag
Reply | Threaded
Open this post in threaded view
|

Re: Throwing Recoverable Exceptions from Tasks

Chesnay Schepler
There is no way to have an exception appear in the REST API without restarting the job; that field is exactly defined as the exception causing the job to fail.

Using asynchronous by itself is fine, so long as you don't wait for any confirmation. In any case you could remedy the issue by writing the alerts into a side-output and having a dedicated sink for submitting these alerts.

On 12/28/2020 5:24 AM, Chirag Dewan wrote:
Hi,

I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function. 

My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I understand correctly, should be avoided.

Another way is to use the Job Manager's exceptions REST API to pull the alerts from the Flink cluster. This however requires me to throw an exception from the operator which results in a job failure/restart.

Is there any way I can throw exceptions that appear in the REST API but don't restart the job? This might be similar to how some of the internal exceptions like checkpointing related appear in the REST output.

I plan on exploring the Queryable state for such use cases as well, but the evolving state might be a blocker.

Thanks
Chirag


Reply | Threaded
Open this post in threaded view
|

Re: Throwing Recoverable Exceptions from Tasks

Arvid Heise-3
A typical solution to your issue is to use an ELK stack to collect the logs and define some filters on log events.

If it's specific to input data issues, I also found side-outputs useful to store invalid data points. Then, you can simply monitor the side topic (assuming Kafka) and already have the data points at hand to improve your software/investigate the root cause.

On Mon, Dec 28, 2020 at 12:14 PM Chesnay Schepler <[hidden email]> wrote:
There is no way to have an exception appear in the REST API without restarting the job; that field is exactly defined as the exception causing the job to fail.

Using asynchronous by itself is fine, so long as you don't wait for any confirmation. In any case you could remedy the issue by writing the alerts into a side-output and having a dedicated sink for submitting these alerts.

On 12/28/2020 5:24 AM, Chirag Dewan wrote:
Hi,

I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function. 

My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I understand correctly, should be avoided.

Another way is to use the Job Manager's exceptions REST API to pull the alerts from the Flink cluster. This however requires me to throw an exception from the operator which results in a job failure/restart.

Is there any way I can throw exceptions that appear in the REST API but don't restart the job? This might be similar to how some of the internal exceptions like checkpointing related appear in the REST output.

I plan on exploring the Queryable state for such use cases as well, but the evolving state might be a blocker.

Thanks
Chirag




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng