Re: Clarification on state backend parameters
Posted by
Stefan Richter on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Clarification-on-state-backend-parameters-tp11419p11438.html
If you have configured RocksDB as backend, Flink typically has multiple RocksDB instances per job - one for each parallel operator instance with keyed state. Those RocksDB instances live local to their corresponding operator instances. Parameter state.backend.rocksdb.
checkpointdir configures the working directory of those instances. Working directories are used to store files during the operation of RocksDB, therefore it should mainly allow for fast access, e.g. be resident on a local disk filesystem. In contrast to that, state.backend.fs.checkpointdir specifies where checkpoint data is stored. Think of this as a backup directory, where the most important properties are availability and fault tolerance. This would typically be located on a distributed file system like HDFS that is also accessible from each node, so that operators can be recovered on different machines in case of machine failures.
I thought rocksdb is used to as a store backend. If that is the case then why would are there 2 configuration parameter? Or in other words what is the behavior if both state.backend.fs.checkpointdir and state.backend.rocksdb is set?