high-availability.storageDir clean up?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

high-availability.storageDir clean up?

Elias Levy
I noticed in one of our cluster that they are relatively old submittedJobGraph* and completedCheckpoint* files.  I was wondering at what point it is save to clean some of these up.


Reply | Threaded
Open this post in threaded view
|

Re: high-availability.storageDir clean up?

Fabian Hueske-2
Hi Elias,

Till (in CC) is familiar with Flink's HA implementation.
He might be able to answer your question.

Thanks,
Fabian

2018-06-25 23:24 GMT+02:00 Elias Levy <[hidden email]>:
I noticed in one of our cluster that they are relatively old submittedJobGraph* and completedCheckpoint* files.  I was wondering at what point it is save to clean some of these up.



Reply | Threaded
Open this post in threaded view
|

Re: high-availability.storageDir clean up?

Till Rohrmann
Hi Elias,

Flink will remove these files if the job reached a globally terminal state (FINISHED, FAILED, CANCELLED). The files should only remain if the cluster crashed. This should give you the opportunity to restart the cluster which can then recover the jobs which have not yet reached a globally terminal state. If you don't intend to recover these jobs, then it should be safe to delete the files.

Cheers,
Till

On Wed, Jun 27, 2018 at 10:14 AM Fabian Hueske <[hidden email]> wrote:
Hi Elias,

Till (in CC) is familiar with Flink's HA implementation.
He might be able to answer your question.

Thanks,
Fabian

2018-06-25 23:24 GMT+02:00 Elias Levy <[hidden email]>:
I noticed in one of our cluster that they are relatively old submittedJobGraph* and completedCheckpoint* files.  I was wondering at what point it is save to clean some of these up.