Update Checkpoint and/or Savepoint Timeout of Running Job without restart?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Update Checkpoint and/or Savepoint Timeout of Running Job without restart?

Aaron Levin
Hello,

Question: Is it possible to update the checkpoint and/or savepoint timeout of a running job without restarting it? If not, is this something that would be a welcomed contribution (not sure how easy this would be)?

Context: sometimes we have jobs who are making progress but get into a state where checkpoints are timing out, though we believe they would be successful if we could increase the checkpoint timeout. Unfortunately we currently need to restart the job to change this, and we would like to avoid this if possible. Ideally we could make this change temporarily, allow a checkpoint or savepoint to succeed, and then change the settings back.

Best,

Aaron Levin
Reply | Threaded
Open this post in threaded view
|

Re: Update Checkpoint and/or Savepoint Timeout of Running Job without restart?

Congxian Qiu
Hi

Currently, we can't change a running job's checkpoint timeout, but there is an issue[1] which wants to set a separate timeout for savepoint.


Aaron Levin <[hidden email]> 于2019年8月17日周六 上午12:37写道:
Hello,

Question: Is it possible to update the checkpoint and/or savepoint timeout of a running job without restarting it? If not, is this something that would be a welcomed contribution (not sure how easy this would be)?

Context: sometimes we have jobs who are making progress but get into a state where checkpoints are timing out, though we believe they would be successful if we could increase the checkpoint timeout. Unfortunately we currently need to restart the job to change this, and we would like to avoid this if possible. Ideally we could make this change temporarily, allow a checkpoint or savepoint to succeed, and then change the settings back.

Best,

Aaron Levin
Reply | Threaded
Open this post in threaded view
|

Re: Update Checkpoint and/or Savepoint Timeout of Running Job without restart?

Aaron Levin
Thanks for the answer, Congxian!

On Sun, Aug 18, 2019 at 10:43 PM Congxian Qiu <[hidden email]> wrote:
Hi

Currently, we can't change a running job's checkpoint timeout, but there is an issue[1] which wants to set a separate timeout for savepoint.


Aaron Levin <[hidden email]> 于2019年8月17日周六 上午12:37写道:
Hello,

Question: Is it possible to update the checkpoint and/or savepoint timeout of a running job without restarting it? If not, is this something that would be a welcomed contribution (not sure how easy this would be)?

Context: sometimes we have jobs who are making progress but get into a state where checkpoints are timing out, though we believe they would be successful if we could increase the checkpoint timeout. Unfortunately we currently need to restart the job to change this, and we would like to avoid this if possible. Ideally we could make this change temporarily, allow a checkpoint or savepoint to succeed, and then change the settings back.

Best,

Aaron Levin