(DEPRECATED) Apache Flink User Mailing List archive.

Fwd: HDFS namenode and Flink

Classic

List

Threaded

5 messages Options

thomas

Fwd: HDFS namenode and Flink

Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I
want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs
filesystem to store the state backend

Thanks

Thomas

stefanobaghino

Re: HDFS namenode and Flink

I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem

state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs

recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs

recovery.mode: zookeeper

recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:

Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas

BR,

Stefano Baghino

Software Engineer @ Radicalbit

stefanobaghino

Re: HDFS namenode and Flink

One last quick note: if you're going to run individual jobs on YARN instead of a long running session, make sure you provide each job with a separate set of directories for (surely) ZK storage and (possibly*) state backend, otherwise the state of the jobs will end up entangled and you may experience some undefined behavior.

* I'm not really sure about this last one, perhaps some more experienced ML user can help me out on this.

On Mon, May 23, 2016 at 12:54 PM, Stefano Baghino <[hidden email]> wrote:

I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit

BR,

Stefano Baghino

Software Engineer @ Radicalbit

thomas

Re: HDFS namenode and Flink

‎Ok, we have all this configuration set up, so it will be fine :-)

Thanks for getting response !

Thomas

De: Stefano Baghino

Envoyé: lundi 23 mai 2016 12:58

À: [hidden email]

Répondre à: [hidden email]

Objet: Re: HDFS namenode and Flink

* I'm not really sure about this last one, perhaps some more experienced ML user can help me out on this.

On Mon, May 23, 2016 at 12:54 PM, Stefano Baghino <[hidden email]> wrote:

I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit

BR,

Stefano Baghino

Software Engineer @ Radicalbit

Till Rohrmann

Re: HDFS namenode and Flink

Hi Thomas,

if you want to run multiple Flink cluster in HA mode, you should configure for every cluster a specific recovery.zookeeper.path.root in your configuration. This will define the root path in ZooKeeper under which the meta checkpoint state handles and the job handles are stored. If you don’t do this, then your clusters will recover jobs from a different cluster.

Cheers,
Till

On Tue, May 24, 2016 at 7:38 AM, <[hidden email]> wrote:

‎Ok, we have all this configuration set up, so it will be fine :-)

Thanks for getting response !

Thomas

De: Stefano Baghino
Envoyé: lundi 23 mai 2016 12:58
À: [hidden email]
Répondre à: [hidden email]
Objet: Re: HDFS namenode and Flink

One last quick note: if you're going to run individual jobs on YARN instead of a long running session, make sure you provide each job with a separate set of directories for (surely) ZK storage and (possibly*) state backend, otherwise the state of the jobs will end up entangled and you may experience some undefined behavior.

* I'm not really sure about this last one, perhaps some more experienced ML user can help me out on this.

On Mon, May 23, 2016 at 12:54 PM, Stefano Baghino <[hidden email]> wrote:
I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit