Figuring out when a job has successfully restored state

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Figuring out when a job has successfully restored state

Gyula Fóra
Hi all,

I am trying to figure out the best way to tell when a job has successfully restored all state and started process.

My first idea was to check the rest api and the number of processed bytes for each parallel operator and if thats greater than 0, it started. Unfortunately this logic fails if the operator doesnt receive any input for some time. 

Do we have any info like this exposed somewhere in a nicely queryable way?

Thanks,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Figuring out when a job has successfully restored state

Gyula Fóra
Hi,

Another thought I had last night, maybe we could have another state for recovering jobs in the future.
Deploying -> Recovering -> Running
This recovering state might only be applicable for state backends that have to be restored before processing can start, lazy state backends (like external databases) might go into processing state "directly".

What do you think? (I'm ccing dev)
Gyula

Gyula Fóra <[hidden email]> ezt írta (időpont: 2017. márc. 27., H, 17:06):
Hi all,

I am trying to figure out the best way to tell when a job has successfully restored all state and started process.

My first idea was to check the rest api and the number of processed bytes for each parallel operator and if thats greater than 0, it started. Unfortunately this logic fails if the operator doesnt receive any input for some time. 

Do we have any info like this exposed somewhere in a nicely queryable way?

Thanks,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Figuring out when a job has successfully restored state

Till Rohrmann-2
Hi Gyula,

there exists a related issue [1]. Fixing this issue will move the state restoration in the state DEPLOYING. This means that when you see a task being in state RUNNING, then it will have restored all of its eager state.


Cheers,
Till

On Tue, Mar 28, 2017 at 10:55 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

Another thought I had last night, maybe we could have another state for recovering jobs in the future.
Deploying -> Recovering -> Running
This recovering state might only be applicable for state backends that have to be restored before processing can start, lazy state backends (like external databases) might go into processing state "directly".

What do you think? (I'm ccing dev)
Gyula

Gyula Fóra <[hidden email]> ezt írta (időpont: 2017. márc. 27., H, 17:06):
Hi all,

I am trying to figure out the best way to tell when a job has successfully restored all state and started process.

My first idea was to check the rest api and the number of processed bytes for each parallel operator and if thats greater than 0, it started. Unfortunately this logic fails if the operator doesnt receive any input for some time. 

Do we have any info like this exposed somewhere in a nicely queryable way?

Thanks,
Gyula

Reply | Threaded
Open this post in threaded view
|

Re: Figuring out when a job has successfully restored state

Gyula Fóra
Thanks Till, 
This is exactly what I was looking for :)

Gyula

Till Rohrmann <[hidden email]> ezt írta (időpont: 2017. márc. 29., Sze, 10:23):
Hi Gyula,

there exists a related issue [1]. Fixing this issue will move the state restoration in the state DEPLOYING. This means that when you see a task being in state RUNNING, then it will have restored all of its eager state.


Cheers,
Till

On Tue, Mar 28, 2017 at 10:55 AM, Gyula Fóra <[hidden email]> wrote:
Hi,

Another thought I had last night, maybe we could have another state for recovering jobs in the future.
Deploying -> Recovering -> Running
This recovering state might only be applicable for state backends that have to be restored before processing can start, lazy state backends (like external databases) might go into processing state "directly".

What do you think? (I'm ccing dev)
Gyula

Gyula Fóra <[hidden email]> ezt írta (időpont: 2017. márc. 27., H, 17:06):
Hi all,

I am trying to figure out the best way to tell when a job has successfully restored all state and started process.

My first idea was to check the rest api and the number of processed bytes for each parallel operator and if thats greater than 0, it started. Unfortunately this logic fails if the operator doesnt receive any input for some time. 

Do we have any info like this exposed somewhere in a nicely queryable way?

Thanks,
Gyula