Hello,
I've been going through the documentation for task-local recovery and came across
this section which discusses that with incremental checkpoints enabled the task-local recovery incurs no additional storage cost. The caveat mentioned indicates that the task local recovery state and all the rocks DB local state must be on a single physical
device to allow the use of hard links. I wanted to understand how to ensure that our RocksDB local state is on the same physical device as the task-local recovery data.
I came across a couple of config options we can set to point the RocksDB local state to a directory of our choosing, along with the task local recovery directory. Do I need to set both up for task local recovery to work correctly? What are the default paths
if I don't set up these configs? (we are using Kubernetes - assume that /opt/flink/local-state below corresponds to a given physical drive)
state.backend.rocksdb.localdir: /opt/flink/local-state/rocksdblocaldir
taskmanager.state.local.root-dirs: /opt/flink/local-state/tasklocaldir
Do these configs make any difference if we turn off incremental checkpointing for RocksDB? Also, setting up this localdir for RocksDB won't affect checkpointing and where the checkpoints are stored, right?
After setting up the above two configs, I ran into some issues where the job would just disappear (or fail) if the Task Manager pod got killed (whereas without this, the job resumed correctly from the last checkpoint after the task manager pod was killed).
Thanks,
Sonam