Storage options for RocksDBStateBackend

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Storage options for RocksDBStateBackend

ayush
Hello,

I had a few questions regarding checkpoint storage options using
RocksDBStateBackend. In the flink 1.2 documentation, it is the recommended state
backend due to it's ability to store large states and asynchronous snapshotting.
For high availabilty it seems HDFS is the recommended store for state backend
data. In AWS deployment section, it is also mentioned that s3 can be used for
storing state backend data.

We don't want to depend on a hadoop cluster for flink deployment, so I had
following questions:

1. Can we use any storage backend supported by flink for storing RocksDB
StateBackend data with file urls: there are quite a few supported as mentioned here:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/filesystems.html
and here:
https://github.com/apache/flink/blob/master/docs/dev/batch/connectors.md

2. Is there some work already done to support Windows Azure Blob Storage for
storing State backend data? There are some docs here:
https://github.com/apache/flink/blob/master/docs/dev/batch/connectors.md
can we utilize this for that?

3. If utilizing S3 for state backend, is there any performance impact?

4. For high availability can we use a NFS volume for state backend, with
"file://" urls? Will there be any performance impact?

-- Ayush