Hello,
I have a question about Zookeeper or HDFS paths in case of running Flink on YARN. In my understanding, when I run multiple Flink cluster using the same zookeeper/HDFS, I have to specify different paths, as e.g. state.backend.fs.checkpointdir, recovery.zookeeper.path.root etc. If I run multiple Flink cluster job on YARN, and want to use checkpoint or JobManager HA, do I need to specify different paths for each cluster/job? or does YARN handle this nicely? Regards, Hironori Ogibayashi |
Hey Hironori,
the storage directories (recovery.zookeeper.storageDir, state.backend.fs.checkpointdir) can stay the same I think (either random or jobID-specific sub folders should be created there). The ZooKeeper root path (recovery.zookeeper.path.root) needs to be unique per cluster for HA. If you upgrade to the to be released 1.1 (vote just passed, binaries are being uploaded) this will be set automatically for YARN. You can also specify it via the new CLI parameter -z <UNIQUE-NAME> (this sets recovery.zookeeper.path.root). Hope this helps. Ufuk On Thu, Aug 4, 2016 at 2:53 AM, Hironori Ogibayashi <[hidden email]> wrote: > Hello, > > I have a question about Zookeeper or HDFS paths in case of running > Flink on YARN. > > In my understanding, when I run multiple Flink cluster using the same > zookeeper/HDFS, I have to specify different paths, as e.g. > state.backend.fs.checkpointdir, > recovery.zookeeper.path.root etc. > > If I run multiple Flink cluster job on YARN, and want to use checkpoint or > JobManager HA, do I need to specify different paths for each cluster/job? or > does YARN handle this nicely? > > Regards, > Hironori Ogibayashi |
Ufuk,
Thank you for your answer. I understood only ZooKeeper root path should be different, and I am glad hear that YARN will automatically handle the root path in the next release. Regards, Hironori 2016-08-04 16:39 GMT+09:00 Ufuk Celebi <[hidden email]>: > Hey Hironori, > > the storage directories (recovery.zookeeper.storageDir, > state.backend.fs.checkpointdir) can stay the same I think (either > random or jobID-specific sub folders should be created there). The > ZooKeeper root path (recovery.zookeeper.path.root) needs to be unique > per cluster for HA. > > If you upgrade to the to be released 1.1 (vote just passed, binaries > are being uploaded) this will be set automatically for YARN. You can > also specify it via the new CLI parameter -z <UNIQUE-NAME> (this sets > recovery.zookeeper.path.root). > > Hope this helps. > > Ufuk > > On Thu, Aug 4, 2016 at 2:53 AM, Hironori Ogibayashi > <[hidden email]> wrote: >> Hello, >> >> I have a question about Zookeeper or HDFS paths in case of running >> Flink on YARN. >> >> In my understanding, when I run multiple Flink cluster using the same >> zookeeper/HDFS, I have to specify different paths, as e.g. >> state.backend.fs.checkpointdir, >> recovery.zookeeper.path.root etc. >> >> If I run multiple Flink cluster job on YARN, and want to use checkpoint or >> JobManager HA, do I need to specify different paths for each cluster/job? or >> does YARN handle this nicely? >> >> Regards, >> Hironori Ogibayashi |
Free forum by Nabble | Edit this page |