(DEPRECATED) Apache Flink User Mailing List archive.

Job recovery from a checkpoint

Classic

List

Threaded

2 messages Options

min.tan

Job recovery from a checkpoint

Hi,

We can get a job recovery from a save point nicely after a restart of our flink cluster using

bin/flink run -s :savepointPath [:runArgs]

The previous job states are recovered after this reload.

I expect I do something similar to recover a flink from a checkpoint location after a restart of our flink cluster (job manager and task manager) using

bin/flink run –s checkpointPath/_metadata [:runArgs]

It seems that our reloaded job does not keep the previous states of the job.

Do I do something wrong? I suppose this is doable and no additional configuration is required?

Regards,

Min

E-mails can involve SUBSTANTIAL RISKS, e.g. lack of confidentiality, potential manipulation of contents and/or sender's address, incorrect recipient (misdirection), viruses etc. Based on previous e-mail correspondence with you and/or an agreement reached with you, UBS considers itself authorized to contact you via e-mail. UBS assumes no responsibility for any loss or damage resulting from the use of e-mails.
The recipient is aware of and accepts the inherent risks of using e-mails, in particular the risk that the banking relationship and confidential information relating thereto are disclosed to third parties.
UBS reserves the right to retain and monitor all messages. Messages are protected and accessed only in legally justified cases.
For information on how UBS uses and discloses personal data, how long we retain it, how we keep it secure and your data protection rights, please see our Privacy Notice http://www.ubs.com/privacy-statement

Yun Tang

Re: Job recovery from a checkpoint

Hi Min

First of all, Flink could resume from an externalized checkpoint with same command as restoring from savepoint.

Did you make the externalized checkpoint retained after job canceled?
Did you really pass the correct checkpoint path (including chk-xxx folder) to the command line?

If you really pass the correct path, please check the jobmanager log to see what happened, did it restore from the checkpoint you want?

Best