[DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Posted by Stephan Ewen on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/DISCUSS-Change-default-for-RocksDB-timers-Java-Heap-in-RocksDB-tp32189.html

Hi all!

I would suggest a change of the current default for timers. A bit of background:

  - Timers (for windows, process functions, etc.) are state that is managed and checkpointed as well.
  - When using the MemoryStateBackend and the FsStateBackend, timers are kept on the JVM heap, like regular state.
  - When using the RocksDBStateBackend, timers can be kept in RocksDB (like other state) or on the JVM heap. The JVM heap is the default though!

I find this a bit un-intuitive and would propose to change this to let the RocksDBStateBackend store all state in RocksDB by default.
The rationale being that if there is a tradeoff (like here), safe and scalable should be the default and unsafe performance be an explicit choice.

This sentiment seems to be shared by various users as well, see https://twitter.com/StephanEwen/status/1214590846168903680 and https://twitter.com/StephanEwen/status/1214594273565388801
We would of course keep the switch and mention in the performance tuning section that this is an option.

# RocksDB State Backend Timers on Heap
  - Pro: faster
  - Con: not memory safe, GC overhead, longer synchronous checkpoint time, no incremental checkpoints

#  RocksDB State Backend Timers on in RocksDB
  - Pro: safe and scalable, asynchronously and incrementally checkpointed
  - Con: performance overhead.

Please chime in and let me know what you think.

Best,
Stephan