high-availability: zookeeper
high-availability.zookeeper.quorum: <host>:<port>
high-availability.cluster-id: /cluster_one # important: customize per cluster
high-availability.storageDir: hdfs:///flink/recovery
I am working on a POC High Availability installation of Flink on top of Kubernetes with HDFS as a data storage location. I am not finding much documentation on doing this, or I am finding the documentation in parts and maybe getting it put together correctly. I think it falls between being an HDFS thing and a Flink thing.I am deploying to Kubernetes using the flink:1.7.0-hadoop27-scala_2.11 container off of docker hub.I think these are the things I need to do1) Setup an hdfs-site.xml file per https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Deployment2) Set the HADOOP_CONF_DIR environment variable to the location of that file per https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#hdfs3) Create a flink-conf.yaml file that looks something likefs.default-scheme: hdfs://state.backend: rocksdbstate.savepoints.dir: hdfs://flink/savepointsstate.checkpoints.dir: hdfs://flink/checkpoints4) Dance a little jig when it works.Has anyone set this up? If so, am I missing anything?-Steve
Konstantin Knauf | Solutions Architect
+49 160 91394525
Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbHFree forum by Nabble | Edit this page |