JobListener weird behaviour

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

JobListener weird behaviour

Flavio Pompermaier
Hello everybody,
these days I have been trying to use the JobListener to implement a simple logic in our platform that consists in calling an external service to signal that the job has ended and, in case of failure, save the error cause.

After some problems to make it work when starting a job using the RestClusterClient on a standalone session cluster I found that it behaves in a strange way when the job fails:onJobSubmitted: the onJobExecuted(JobExecutionResult, Throwable) is called with Throwable equals to null and if I use the JobID contained in the JobExecutionResult the fetch the job status / error (in a monitoring thread that I start in onJobSubmitted() as I explain at the end of this email) I continue to see that the job status is RUNNING for a while/
Shouldn't onJobExecuted() be called after the final job state transition (as a very last callback)?

Another last weird thing: I submit the job using the jarRunHandler of the REST API but the JobClient passed in the onJobSubmitted() is a WebSubmissionJobClient that is a VERY basic implementation (actually it provides only the job ID) and does not allow to get the job status...for this reason (and the fact that the onJobExecuted is not called on the final state transition) I had to create a separate monitor thread in the onJobSubmitted (that create a RestClusterClient to get the status of the job every 10 seconds and, in case of failure, the exceptions associated to it)..but this is very uncomfortable and I don't really like it..is there any effort to improve this?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: JobListener weird behaviour

Till Rohrmann
Hi Flavio,

looking only at the code, then the job should first transition into a globally terminal state before notifying the client about it. The only possible reason I could see for this behaviour is that the RestServerEndpoint uses an ExecutionGraphCache (DefaultExecutionGraphCache is the implementation) which caches `ArchivedExecutionGraphs` so that the REST handlers don't flood the Dispatcher with `requestJob` requests. The cache keeps the entries for 3 seconds before asking the cluster again. So you might ask a REST handler which responds to you based on cached and thereby outdated results. At the moment, the only easy way for working around this problem is to decrease the `web.refresh-interval`.

For the JarRunHandler problem, I fear that this is a problem of the web submission implementation which has accumulated a bit of technical debt. As far as I know, nobody is actively working on it at the moment.

Cheers,
Till 

On Tue, Nov 24, 2020 at 10:00 AM Flavio Pompermaier <[hidden email]> wrote:
Hello everybody,
these days I have been trying to use the JobListener to implement a simple logic in our platform that consists in calling an external service to signal that the job has ended and, in case of failure, save the error cause.

After some problems to make it work when starting a job using the RestClusterClient on a standalone session cluster I found that it behaves in a strange way when the job fails:onJobSubmitted: the onJobExecuted(JobExecutionResult, Throwable) is called with Throwable equals to null and if I use the JobID contained in the JobExecutionResult the fetch the job status / error (in a monitoring thread that I start in onJobSubmitted() as I explain at the end of this email) I continue to see that the job status is RUNNING for a while/
Shouldn't onJobExecuted() be called after the final job state transition (as a very last callback)?

Another last weird thing: I submit the job using the jarRunHandler of the REST API but the JobClient passed in the onJobSubmitted() is a WebSubmissionJobClient that is a VERY basic implementation (actually it provides only the job ID) and does not allow to get the job status...for this reason (and the fact that the onJobExecuted is not called on the final state transition) I had to create a separate monitor thread in the onJobSubmitted (that create a RestClusterClient to get the status of the job every 10 seconds and, in case of failure, the exceptions associated to it)..but this is very uncomfortable and I don't really like it..is there any effort to improve this?

Best,
Flavio