Hi All,
I read the docs however I still have the following question For Stateful stream processing is HDFS mandatory? because In some places I see it is required and other places I see that rocksDB can be used. I just want to know if HDFS is mandatory for Stateful stream processing? Thanks! |
Hi Kant,
Jumping in here, would love corrections if I'm wrong about any of this. In short answer, no, HDFS is not necessary to run stateful stream processing. In the minimal case, you can use the MemoryStateBackend to back up your state onto the JobManager. In any production scenario, you will want more durability for your checkpoints and larger state size. To do this, you should use either RocksDBStateBackend or FsStateBackend. Assuming you want one of these, you will need a checkpoint directory on a filesystem that is accessible by all TaskManagers. The filesystem for this checkpointing directory (state.backend.*. under Hadoop Compatible File Systems for other alternatives (S3, for example). Choosing RocksDBStateBackend vs. FsStateBackend is a different decision. FsStateBackend stores in-flight state in memory and writes it to your durable filesystem only when checkpoints are initiated. The RocksDBStateBackend stores in-flight data on local disk (in RocksDB) instead of in-memory. When checkpoints are initiated, the appropriate state is then written to the durable filesystem. Because it stores state on disk, RocksDBStateBackend can handle much larger state than FsStateBackend on equivalent hardware. I'm drawing most of this from this page: Does that make sense? Cheers, Wolfe ~ Brian Wolfe On Fri, Apr 7, 2017 at 2:32 AM, kant kodali <[hidden email]> wrote:
|
Hi Wolfe, that's all correct. Thank you!2017-04-07 16:34 GMT+02:00 Brian Wolfe <[hidden email]>:
|
Free forum by Nabble | Edit this page |