Job completion or failure callback?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Job completion or failure callback?

Shannon Carey
Hi,

Is there any way we can run a callback on job completion or failure without leaving the client running during job execution? For example, when we submit the job via the web UI the main() method's call to ExecutionEnvironment#execute() appears to by asynchronous with the job execution. Therefore, the execute() call returns before the job is completed. This is a bit confusing because the behavior is different when run from the IDE vs. run in a cluster, and because signature of the returned class JobExecutionResult implies that it can tell you how long execution took (it has getNetRuntime()). We would like to be able to detect job completion or failure so that we can monitor the success or failure of batch jobs, in particular, so that we can react to failures appropriately. It seems like the JobManager should be capable of executing callbacks like this. Otherwise, we'll have to create an external component that eg. polls the web UI/API for job status.

Does the web UI run in the same JVM as the JobManager (when deployed in YARN)? If so, I would expect logs from the main method to appear in the JobManager logs. However, for some reason I can't find log messages or System.out  messages when they are logged in the main() method after execute() is called. Why is that?
Edit: figured it out: OptimizerPlanEnvironment#execute() ends with "throw new ProgramAbortException()". Tricky and unexpected. That should definitely be mentioned in the javadocs of the execute() method! Even the documentation says, "The execute() method is returning a JobExecutionResult, this contains execution times and accumulator results." which isn't true, or at least isn't always true.

Thanks,
Shannon
Reply | Threaded
Open this post in threaded view
|

Re: Job completion or failure callback?

rmetzger0
Hi Shannon,

the web UI runs on the same JVM as the JobManager, so log outputs should go there.

There is no way of running user code on the JobManager on job completion. We try to not allow users to execute code on the JobManager...bringing the JM down, will kill the entire cluster :)

What you are asking for is a valid feature. That's why there is a very old JIRA for adding it: https://issues.apache.org/jira/browse/FLINK-2313.
I'm not aware of anybody working on the feature right now, but I know that we at data Artisans are considering to put some effort into it for the 1.4 release (no guarantee for this of course, others are obviously free to pick the issue up any time).

For now, you'll probably have to resort to the web REST api, or to connecting to the JobManager actor system and subscribing to the "JobStatusListener" or something :)

I hope that helps.

Regards,
Robert


On Thu, Mar 9, 2017 at 12:29 AM, Shannon Carey <[hidden email]> wrote:
Hi,

Is there any way we can run a callback on job completion or failure without leaving the client running during job execution? For example, when we submit the job via the web UI the main() method's call to ExecutionEnvironment#execute() appears to by asynchronous with the job execution. Therefore, the execute() call returns before the job is completed. This is a bit confusing because the behavior is different when run from the IDE vs. run in a cluster, and because signature of the returned class JobExecutionResult implies that it can tell you how long execution took (it has getNetRuntime()). We would like to be able to detect job completion or failure so that we can monitor the success or failure of batch jobs, in particular, so that we can react to failures appropriately. It seems like the JobManager should be capable of executing callbacks like this. Otherwise, we'll have to create an external component that eg. polls the web UI/API for job status.

Does the web UI run in the same JVM as the JobManager (when deployed in YARN)? If so, I would expect logs from the main method to appear in the JobManager logs. However, for some reason I can't find log messages or System.out  messages when they are logged in the main() method after execute() is called. Why is that?
Edit: figured it out: OptimizerPlanEnvironment#execute() ends with "throw new ProgramAbortException()". Tricky and unexpected. That should definitely be mentioned in the javadocs of the execute() method! Even the documentation says, "The execute() method is returning a JobExecutionResult, this contains execution times and accumulator results." which isn't true, or at least isn't always true.

Thanks,
Shannon