Restoring state from an incremental RocksDB checkpoint

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Restoring state from an incremental RocksDB checkpoint

Yuval Itzchakov
Hi,

We're using RocksDB as a state backend. We've come to a situation where due to high backpressure in one of our operators, we can't make a savepoint complete.

Since we have retained previous checkpoints, I was wondering if these would be eligible to serve as a restoration point, given that we are taking advantage of RocksDBs incremental snapshot capability, I was unsure. Would the incremental snapshot be missing data? or do they point to the remaining parts of previous checkpoints?
Reply | Threaded
Open this post in threaded view
|

Re: Restoring state from an incremental RocksDB checkpoint

Andrey Zagrebin-4
Hi Yuval,

You should be able to restore from the last checkpoint by restarting the job with the same checkpoint directory.
An incremental part is removed only if none of retained checkpoints points to it.

Best,
Andrey

> On 13 Mar 2020, at 16:06, Yuval Itzchakov <[hidden email]> wrote:
>
> Hi,
>
> We're using RocksDB as a state backend. We've come to a situation where due to high backpressure in one of our operators, we can't make a savepoint complete.
>
> Since we have retained previous checkpoints, I was wondering if these would be eligible to serve as a restoration point, given that we are taking advantage of RocksDBs incremental snapshot capability, I was unsure. Would the incremental snapshot be missing data? or do they point to the remaining parts of previous checkpoints?

Reply | Threaded
Open this post in threaded view
|

Re: Restoring state from an incremental RocksDB checkpoint

Andrey Zagrebin-4
As I understand you have already enabled retained checkpoints [1] because you can only restore from them in case of job cancellation to restart it.
Just in case, here is also the link to docs about restoring from a retained checkpoint [2] and how to find path to it [3].

[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure

On 14 Mar 2020, at 00:12, Andrey Zagrebin <[hidden email]> wrote:

Hi Yuval,

You should be able to restore from the last checkpoint by restarting the job with the same checkpoint directory.
An incremental part is removed only if none of retained checkpoints points to it.

Best,
Andrey

On 13 Mar 2020, at 16:06, Yuval Itzchakov <[hidden email]> wrote:

Hi,

We're using RocksDB as a state backend. We've come to a situation where due to high backpressure in one of our operators, we can't make a savepoint complete.

Since we have retained previous checkpoints, I was wondering if these would be eligible to serve as a restoration point, given that we are taking advantage of RocksDBs incremental snapshot capability, I was unsure. Would the incremental snapshot be missing data? or do they point to the remaining parts of previous checkpoints?