How to avoid path conflict in zookeeper/HDFS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to avoid path conflict in zookeeper/HDFS

Hironori Ogibayashi
Hello,

I have a question about Zookeeper or HDFS paths in case of running
Flink on YARN.

In my understanding, when I run multiple Flink cluster using the same
zookeeper/HDFS, I have to specify different paths, as e.g.
state.backend.fs.checkpointdir,
recovery.zookeeper.path.root etc.

If I run multiple Flink cluster job on YARN, and want to use checkpoint or
JobManager HA, do I need to specify different paths for each cluster/job? or
does YARN handle this nicely?

Regards,
Hironori Ogibayashi
Reply | Threaded
Open this post in threaded view
|

Re: How to avoid path conflict in zookeeper/HDFS

Ufuk Celebi
Hey Hironori,

the storage directories (recovery.zookeeper.storageDir,
state.backend.fs.checkpointdir) can stay the same I think (either
random or jobID-specific sub folders should be created there). The
ZooKeeper root path (recovery.zookeeper.path.root) needs to be unique
per cluster for HA.

If you upgrade to the to be released 1.1 (vote just passed, binaries
are being uploaded) this will be set automatically for YARN. You can
also specify it via the new CLI parameter -z <UNIQUE-NAME> (this sets
recovery.zookeeper.path.root).

Hope this helps.

Ufuk

On Thu, Aug 4, 2016 at 2:53 AM, Hironori Ogibayashi
<[hidden email]> wrote:

> Hello,
>
> I have a question about Zookeeper or HDFS paths in case of running
> Flink on YARN.
>
> In my understanding, when I run multiple Flink cluster using the same
> zookeeper/HDFS, I have to specify different paths, as e.g.
> state.backend.fs.checkpointdir,
> recovery.zookeeper.path.root etc.
>
> If I run multiple Flink cluster job on YARN, and want to use checkpoint or
> JobManager HA, do I need to specify different paths for each cluster/job? or
> does YARN handle this nicely?
>
> Regards,
> Hironori Ogibayashi
Reply | Threaded
Open this post in threaded view
|

Re: How to avoid path conflict in zookeeper/HDFS

Hironori Ogibayashi
Ufuk,

Thank you for your answer.
I understood only ZooKeeper root path should be different, and I am
glad hear that
YARN will automatically handle the root path in the next release.

Regards,
Hironori

2016-08-04 16:39 GMT+09:00 Ufuk Celebi <[hidden email]>:

> Hey Hironori,
>
> the storage directories (recovery.zookeeper.storageDir,
> state.backend.fs.checkpointdir) can stay the same I think (either
> random or jobID-specific sub folders should be created there). The
> ZooKeeper root path (recovery.zookeeper.path.root) needs to be unique
> per cluster for HA.
>
> If you upgrade to the to be released 1.1 (vote just passed, binaries
> are being uploaded) this will be set automatically for YARN. You can
> also specify it via the new CLI parameter -z <UNIQUE-NAME> (this sets
> recovery.zookeeper.path.root).
>
> Hope this helps.
>
> Ufuk
>
> On Thu, Aug 4, 2016 at 2:53 AM, Hironori Ogibayashi
> <[hidden email]> wrote:
>> Hello,
>>
>> I have a question about Zookeeper or HDFS paths in case of running
>> Flink on YARN.
>>
>> In my understanding, when I run multiple Flink cluster using the same
>> zookeeper/HDFS, I have to specify different paths, as e.g.
>> state.backend.fs.checkpointdir,
>> recovery.zookeeper.path.root etc.
>>
>> If I run multiple Flink cluster job on YARN, and want to use checkpoint or
>> JobManager HA, do I need to specify different paths for each cluster/job? or
>> does YARN handle this nicely?
>>
>> Regards,
>> Hironori Ogibayashi