Fwd: HDFS namenode and Flink

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: HDFS namenode and Flink

thomas
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I
want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs
filesystem to store the state backend

Thanks

Thomas
Reply | Threaded
Open this post in threaded view
|

Re: HDFS namenode and Flink

stefanobaghino
I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: HDFS namenode and Flink

stefanobaghino
One last quick note: if you're going to run individual jobs on YARN instead of a long running session, make sure you provide each job with a separate set of directories for (surely) ZK storage and (possibly*) state backend, otherwise the state of the jobs will end up entangled and you may experience some undefined behavior.

* I'm not really sure about this last one, perhaps some more experienced ML user can help me out on this.

On Mon, May 23, 2016 at 12:54 PM, Stefano Baghino <[hidden email]> wrote:
I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: HDFS namenode and Flink

thomas
‎Ok, we have all this configuration set up, so it will be fine :-)

Thanks for getting response ! 

Thomas


De: Stefano Baghino
Envoyé: lundi 23 mai 2016 12:58
Répondre à: [hidden email]
Objet: Re: HDFS namenode and Flink

One last quick note: if you're going to run individual jobs on YARN instead of a long running session, make sure you provide each job with a separate set of directories for (surely) ZK storage and (possibly*) state backend, otherwise the state of the jobs will end up entangled and you may experience some undefined behavior.

* I'm not really sure about this last one, perhaps some more experienced ML user can help me out on this.

On Mon, May 23, 2016 at 12:54 PM, Stefano Baghino <[hidden email]> wrote:
I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply | Threaded
Open this post in threaded view
|

Re: HDFS namenode and Flink

Till Rohrmann

Hi Thomas,

if you want to run multiple Flink cluster in HA mode, you should configure for every cluster a specific recovery.zookeeper.path.root in your configuration. This will define the root path in ZooKeeper under which the meta checkpoint state handles and the job handles are stored. If you don’t do this, then your clusters will recover jobs from a different cluster.

Cheers,
Till


On Tue, May 24, 2016 at 7:38 AM, <[hidden email]> wrote:
‎Ok, we have all this configuration set up, so it will be fine :-)

Thanks for getting response ! 

Thomas


De: Stefano Baghino
Envoyé: lundi 23 mai 2016 12:58
Répondre à: [hidden email]
Objet: Re: HDFS namenode and Flink

One last quick note: if you're going to run individual jobs on YARN instead of a long running session, make sure you provide each job with a separate set of directories for (surely) ZK storage and (possibly*) state backend, otherwise the state of the jobs will end up entangled and you may experience some undefined behavior.

* I'm not really sure about this last one, perhaps some more experienced ML user can help me out on this.

On Mon, May 23, 2016 at 12:54 PM, Stefano Baghino <[hidden email]> wrote:
I think the only keys of interest for your needs (highly available with HDFS state backend) are

state.backend: filesystem
state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs
recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your needs
recovery.mode: zookeeper
recovery.zookeeper.quorum: zk-ensemble-1:2181,zk-ensemble-2:2181,zk-ensemble-3:2181 # put your zk ensemble here

If these keys are set you should be good to go. I hope I've been of some help. :)

On Mon, May 23, 2016 at 12:37 PM, <[hidden email]> wrote:
Hello flinkers,

We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ?
We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend

Thanks

Thomas



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit