Start flink job from the latest checkpoint programmatically

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Start flink job from the latest checkpoint programmatically

Eleanore Jin
Hi All, 

The setup of my flink application is allow user to start and stop.

The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program.

When restart a stopped job, it means to spin up new job cluster with the same application jar, but this essentially means a new flink job. 

I just wonder is there a way to let the restarted job resume from the latest checkpoint from previous stopped flink job? And is there a way to set it up programmatically in the application?

Thanks a lot!
Eleanore


Reply | Threaded
Open this post in threaded view
|

Re: Start flink job from the latest checkpoint programmatically

Flavio Pompermaier
Have you tried to retain checkpoints or use savepoints? Take a look at [1] and see if that can help.

Best,

Il Ven 13 Mar 2020, 00:02 Eleanore Jin <[hidden email]> ha scritto:
Hi All, 

The setup of my flink application is allow user to start and stop.

The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program.

When restart a stopped job, it means to spin up new job cluster with the same application jar, but this essentially means a new flink job. 

I just wonder is there a way to let the restarted job resume from the latest checkpoint from previous stopped flink job? And is there a way to set it up programmatically in the application?

Thanks a lot!
Eleanore


Reply | Threaded
Open this post in threaded view
|

Re: Start flink job from the latest checkpoint programmatically

Vijay Bhaskar
2 things you can do,

stop flink job is going to generate savepoint. 
You need to save the save point directory path in some persistent store (because you are restarting the cluster, otherwise checkpoint monitoring api should give you save point file details)
After spinning the cluster read the path of the save point file, use flink monitoring rest api to load the job using save point

You can also use the retained checkpoint as mentioned above.

Also another option is to give save point path manually as program argument and respin your job. 

Best,
Bhaskar


On Fri, Mar 13, 2020 at 5:47 AM Flavio Pompermaier <[hidden email]> wrote:
Have you tried to retain checkpoints or use savepoints? Take a look at [1] and see if that can help.

Best,

Il Ven 13 Mar 2020, 00:02 Eleanore Jin <[hidden email]> ha scritto:
Hi All, 

The setup of my flink application is allow user to start and stop.

The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program.

When restart a stopped job, it means to spin up new job cluster with the same application jar, but this essentially means a new flink job. 

I just wonder is there a way to let the restarted job resume from the latest checkpoint from previous stopped flink job? And is there a way to set it up programmatically in the application?

Thanks a lot!
Eleanore