Capturing the exception that leads to a job entering the FAILED state

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Capturing the exception that leads to a job entering the FAILED state

Andrew Roberts
Hello,

I’m trying to connect our Flink deployment to our error monitor tool, and I’m struggling to find an entry point for capturing that exception. I’ve been poking around a little bit of the source, but I can’t seem to connect anything I’ve found to the job submission API we’re using (`env.execute()` after building a graph). Is there a codified way to either add a listener or otherwise inspect the `Throwable` that led to a job failure? I’d like to be able to capture it even when the restart policy queues the job for resubmission.

Thanks,

Andrew
--
*Confidentiality Notice: The information contained in this e-mail and any
attachments may be confidential. If you are not an intended recipient, you
are hereby notified that any dissemination, distribution or copying of this
e-mail is strictly prohibited. If you have received this e-mail in error,
please notify the sender and permanently delete the e-mail and any
attachments immediately. You should not retain, copy or use this e-mail or
any attachment for any purpose, nor disclose all or any part of the
contents to any other person. Thank you.*