flink snapshotting fault-tolerance

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: flink async snapshots

Aljoscha Krettek
Thats correct. With the fully async option the checkpoints take longer but you don't impact ongoing processing of elements. With the semi-async method snapshots take a shorter time but during the synchronous part no element processing can happen.

On Fri, 20 May 2016 at 15:04 Abhishek Singh <[hidden email]> wrote:
Yes. Thanks for explaining. 

On Friday, May 20, 2016, Ufuk Celebi <[hidden email]> wrote:
On Thu, May 19, 2016 at 8:54 PM, Abhishek R. Singh
<[hidden email]> wrote:
> If you can take atomic in-memory copies, then it works (at the cost of
> doubling your instantaneous memory). For larger state (say rocks DB), won’t
> you have to stop the world (atomic snapshot) and make a copy? Doesn’t that
> make it synchronous, instead of background/async?

Hey Abhishek,

that's correct. There are two variants for RocksDB:

- semi-async (default): snapshot is taking via RocksDB backup feature
to backup to a directory (sync). This is then copied to the final
checkpoint location (async, e.g copy to HDFS).

- fully-async: snapshot is taking via RocksDB snapshot feature (sync,
but no full copy and essentially "free"). With this snapshot we
iterate over all k/v-pairs and copy them to the final checkpoint
location (async, e.g. copy to HDFS).

You enable the second variant via: rocksDbBackend.enableFullyAsyncSnapshots();

This is only part of the 1.1-SNAPSHOT version though.

I'm not too familiar with the performance characteristics of both
variants, but maybe Aljoscha can chime in.

Does this clarify things for you?

– Ufuk
12