(DEPRECATED) Apache Flink User Mailing List archive.

Restoring state from an incremental RocksDB checkpoint

Classic

List

Threaded

3 messages Options

Yuval Itzchakov

Restoring state from an incremental RocksDB checkpoint

Hi,

We're using RocksDB as a state backend. We've come to a situation where due to high backpressure in one of our operators, we can't make a savepoint complete.

Since we have retained previous checkpoints, I was wondering if these would be eligible to serve as a restoration point, given that we are taking advantage of RocksDBs incremental snapshot capability, I was unsure. Would the incremental snapshot be missing data? or do they point to the remaining parts of previous checkpoints?

Andrey Zagrebin-4

Re: Restoring state from an incremental RocksDB checkpoint

Hi Yuval,

You should be able to restore from the last checkpoint by restarting the job with the same checkpoint directory.
An incremental part is removed only if none of retained checkpoints points to it.

Best,
Andrey

> On 13 Mar 2020, at 16:06, Yuval Itzchakov <[hidden email]> wrote:
>
> Hi,
>
> We're using RocksDB as a state backend. We've come to a situation where due to high backpressure in one of our operators, we can't make a savepoint complete.
>
> Since we have retained previous checkpoints, I was wondering if these would be eligible to serve as a restoration point, given that we are taking advantage of RocksDBs incremental snapshot capability, I was unsure. Would the incremental snapshot be missing data? or do they point to the remaining parts of previous checkpoints?

Andrey Zagrebin-4

Re: Restoring state from an incremental RocksDB checkpoint

As I understand you have already enabled retained checkpoints [1] because you can only restore from them in case of job cancellation to restart it.

Just in case, here is also the link to docs about restoring from a retained checkpoint [2] and how to find path to it [3].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#retained-checkpoints

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint

[3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure

On 14 Mar 2020, at 00:12, Andrey Zagrebin <[hidden email]> wrote:

Hi Yuval,

You should be able to restore from the last checkpoint by restarting the job with the same checkpoint directory.
An incremental part is removed only if none of retained checkpoints points to it.

Best,
Andrey

On 13 Mar 2020, at 16:06, Yuval Itzchakov <[hidden email]> wrote:

Hi,

We're using RocksDB as a state backend. We've come to a situation where due to high backpressure in one of our operators, we can't make a savepoint complete.

Since we have retained previous checkpoints, I was wondering if these would be eligible to serve as a restoration point, given that we are taking advantage of RocksDBs incremental snapshot capability, I was unsure. Would the incremental snapshot be missing data? or do they point to the remaining parts of previous checkpoints?