Reference configs for HA / RocksDB / YARN / Zookeeper / HDFS

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Reference configs for HA / RocksDB / YARN / Zookeeper / HDFS

Torok, David

Hi,

 

Forgive me if parts of this question have been answered before but I’d like help in resolving some bits of confusion from the documentation and the fact that I haven’t been able to find a good example anywhere for an enterprise-style setup.  If anyone has a sample HA / Yarn / ZK / RocksDB configuration could you share?

 

We are currently using Flink 1.2.0 and Hortonworks (an older version, 2.2.9 based on Hadoop 2.6.0).  We’re trying a small sample cluster with 9 Yarn client nodes.

 

1.       We have large state and large time-windows and therefore want to use RocksDB as our state backend.  Is it a typical or best practice that RocksDB store to local-disk storage for speed, and the checkpoints store to HDFS for recovery / HA?  Or is everything in HDFS?  So from my understanding from the docs, “The RocksDBStateBackend holds in-flight data in a RocksDB data base that is (per default) stored in the TaskManager data directories”…  (is this set automatic via YARN?)… and the checkpoint directory is via “state.backend.fs.checkpointdir: hdfs://namenode:40010/flink/checkpoints” or dynamically e.g. new RocksDBStateBackend(statepath).

2.       It’s unclear to me whether Yarn automatically provides Flink with the Zookeeper information, or whether I also need to set the zookeeper info in flink-conf.yaml… the examples seem to imply that the ZK information might only be used if you start your own Zookeeper rather than it already existing.  Do I need to set it up for HA via YARN?

3.       I’ve seen some conflicting information about including HADOOP_CLASSPATH – some say there are many conflicts with Flink libraries whereas others say it’s important to resolve various deserialization errors during runtime.

4.       Someone suggested that we build Flink from source ourselves against the Hortonworks distribution; I’m really hoping that’s not necessary.

 

Appreciate any info as we learn how to productionize our Flink clusters!

 

Best Regards

Dave