(DEPRECATED) Apache Flink User Mailing List archive.

RocksDB native checkpoint time

Classic

List

Threaded

6 messages Options

Gyula Fóra

RocksDB native checkpoint time

Hi!

Does anyone know what parameters might affect the RocksDB native checkpoint time? (basically the sync part of the rocksdb incremental snapshots)

It seems to take 60-70 secs in some cases for larger state sizes, and I wonder if there is anything we could tune to reduce this. Maybe its only a matter of size i dont know.

Any ideas would be appreciated :)

Gyula

Piotr Nowojski-3

Re: RocksDB native checkpoint time

Hi Gyula,

Have you read our tuning guide?

https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb

Synchronous part is mostly about flushing data to disks, so you could try to optimise your setup having that in mind. Limiting the size of a page cache, speeding up the writes (using more/faster disks…), etc… Maybe you can also look at online resources how to speedup calls to `org.rocksdb.Checkpoint#create`.

Piotrek

On 3 May 2019, at 10:30, Gyula Fóra <[hidden email]> wrote:

Hi!

Does anyone know what parameters might affect the RocksDB native checkpoint time? (basically the sync part of the rocksdb incremental snapshots)

It seems to take 60-70 secs in some cases for larger state sizes, and I wonder if there is anything we could tune to reduce this. Maybe its only a matter of size i dont know.

Any ideas would be appreciated :)
Gyula

Stefan Richter-4

Re: RocksDB native checkpoint time

Hi,

out of curiosity, does it happen with jobs that have a large number of states (column groups) or also for jobs with few column groups and just “big state”?

Best,

Stefan

On 3. May 2019, at 11:04, Piotr Nowojski <[hidden email]> wrote:

Hi Gyula,

Have you read our tuning guide?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb

Synchronous part is mostly about flushing data to disks, so you could try to optimise your setup having that in mind. Limiting the size of a page cache, speeding up the writes (using more/faster disks…), etc… Maybe you can also look at online resources how to speedup calls to `org.rocksdb.Checkpoint#create`.

Piotrek

On 3 May 2019, at 10:30, Gyula Fóra <[hidden email]> wrote:

Hi!

Does anyone know what parameters might affect the RocksDB native checkpoint time? (basically the sync part of the rocksdb incremental snapshots)

It seems to take 60-70 secs in some cases for larger state sizes, and I wonder if there is anything we could tune to reduce this. Maybe its only a matter of size i dont know.

Any ideas would be appreciated :)
Gyula

Gyula Fóra

Re: RocksDB native checkpoint time

Thanks Piotr for the tips we will play around with some settings.

@Stefan
It is a few columns but a lot of rows

Gyula

On Fri, May 3, 2019 at 11:43 AM Stefan Richter <[hidden email]> wrote:

Hi,

out of curiosity, does it happen with jobs that have a large number of states (column groups) or also for jobs with few column groups and just “big state”?

Best,
Stefan

On 3. May 2019, at 11:04, Piotr Nowojski <[hidden email]> wrote:

Hi Gyula,

Have you read our tuning guide?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb

Synchronous part is mostly about flushing data to disks, so you could try to optimise your setup having that in mind. Limiting the size of a page cache, speeding up the writes (using more/faster disks…), etc… Maybe you can also look at online resources how to speedup calls to `org.rocksdb.Checkpoint#create`.

Piotrek

On 3 May 2019, at 10:30, Gyula Fóra <[hidden email]> wrote:

Hi!

Does anyone know what parameters might affect the RocksDB native checkpoint time? (basically the sync part of the rocksdb incremental snapshots)

It seems to take 60-70 secs in some cases for larger state sizes, and I wonder if there is anything we could tune to reduce this. Maybe its only a matter of size i dont know.

Any ideas would be appreciated :)
Gyula

Konstantin Knauf-2

Re: RocksDB native checkpoint time

Hi Gyula,

I looked into this a bit recently as well and did some experiments (on my local machine). The only parameter that significantly changed anything in this setup was reducing the total size of the write buffers (number or size memtables). I was not able to find any online resources on the performance of checkpoint creation in RocksDB, so looking forward to your findings...

Cheers,

Konstantin

On Fri, May 3, 2019 at 12:10 PM Gyula Fóra <[hidden email]> wrote:

Thanks Piotr for the tips we will play around with some settings.

@Stefan
It is a few columns but a lot of rows

Gyula

On Fri, May 3, 2019 at 11:43 AM Stefan Richter <[hidden email]> wrote:
Hi,

out of curiosity, does it happen with jobs that have a large number of states (column groups) or also for jobs with few column groups and just “big state”?

Best,
Stefan

On 3. May 2019, at 11:04, Piotr Nowojski <[hidden email]> wrote:

Hi Gyula,

Have you read our tuning guide?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb

Synchronous part is mostly about flushing data to disks, so you could try to optimise your setup having that in mind. Limiting the size of a page cache, speeding up the writes (using more/faster disks…), etc… Maybe you can also look at online resources how to speedup calls to `org.rocksdb.Checkpoint#create`.

Piotrek

On 3 May 2019, at 10:30, Gyula Fóra <[hidden email]> wrote:

Hi!

Does anyone know what parameters might affect the RocksDB native checkpoint time? (basically the sync part of the rocksdb incremental snapshots)

It seems to take 60-70 secs in some cases for larger state sizes, and I wonder if there is anything we could tune to reduce this. Maybe its only a matter of size i dont know.

Any ideas would be appreciated :)
Gyula

Konstantin Knauf | Solutions Architect

+49 160 91394525

Planned Absences: -

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Gyula Fóra

Re: RocksDB native checkpoint time

Hey,

I have collected some rocksdb logs for the snapshot itself but I cant really wrap my head around where exactly the time is spent:
https://gist.github.com/gyfora/9a37aa349f63c35cd6abe2da2cf19d5b

The general pattern where the time is spent is this:

2019/05/14-09:15:49.486455 7fbe6a8ee700 [db/db_impl_write.cc:1127] [new-timer-state] New memtable created with log file: #111757. Immutable memtables: 0.

2019/05/14-09:15:59.191010 7fb3cdc1d700 (Original Log Time 2019/05/14-09:15:59.191000) [db/db_impl_compaction_flush.cc:1216] Calling FlushMemTableToOutputFile with column family [new-timer-state], flush slots available 1, compaction slots available 1, flush slots scheduled 1, compaction slots scheduled 0

In this example these two operations take 10 seconds, but sometimes its 40. Based on the log wording I dont understand what exactly is going on in between. Maybe someone with some on-hands experience with RocksDB might have some insights.

Thanks,

Gyula