Replacing a server in Zookeeper Quorum

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Replacing a server in Zookeeper Quorum

Aaron Langford
Hello Flink Community,

I'm working on a HA setup of Flink 1.8.1 on AWS EMR and have some questions about how Flink interacts with Zookeeper when one of the servers in the quorum specified in flink-conf.yaml goes down and is replaced by a machine with a new IP address.

Currently, I configure high-availability.zookeeper.quorum to be the IP addresses of the 3 master nodes of the EMR cluster, as this is what AWS does to enable a highly available YARN setup.

EMR master nodes may go down entirely and need to be replaced by a machine with a different instance IP address. I will almost certainly need to perform a rolling configuration update to account for this. But will I need to restart flink for this to take effect? Is there a way to dynamically reload these configs when they change?

Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Replacing a server in Zookeeper Quorum

Yang Wang
Hi Aaron,

I think it is not the responsibility of Flink. Flink uses zookeeper curator to connect the zk server. If
multiple zk server are specified, it has an automatic retry mechanism.
However, your problem is ip address will change when the EMR instance restarts. Currently, Flink
can not support dynamically loading configuration. One quick solution is to use a static ip for EMR
master node[1].


Best,
Yang



Aaron Langford <[hidden email]> 于2020年1月22日周三 上午1:48写道:
Hello Flink Community,

I'm working on a HA setup of Flink 1.8.1 on AWS EMR and have some questions about how Flink interacts with Zookeeper when one of the servers in the quorum specified in flink-conf.yaml goes down and is replaced by a machine with a new IP address.

Currently, I configure high-availability.zookeeper.quorum to be the IP addresses of the 3 master nodes of the EMR cluster, as this is what AWS does to enable a highly available YARN setup.

EMR master nodes may go down entirely and need to be replaced by a machine with a different instance IP address. I will almost certainly need to perform a rolling configuration update to account for this. But will I need to restart flink for this to take effect? Is there a way to dynamically reload these configs when they change?

Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Replacing a server in Zookeeper Quorum

tison
I second Yang that it would be a workaround that you set a static ip for
EMR master node.

Even in ZooKeeper world reconfig is a new and immature feature since 3.5.3
while Flink uses ZooKeeper 3.4.x. It would be a breaking change if we "just"
upgrade zk version but hopefully the Flink community keep digging out a safe
upgrade path.

Best,
tison.


Yang Wang <[hidden email]> 于2020年1月22日周三 上午10:34写道:
Hi Aaron,

I think it is not the responsibility of Flink. Flink uses zookeeper curator to connect the zk server. If
multiple zk server are specified, it has an automatic retry mechanism.
However, your problem is ip address will change when the EMR instance restarts. Currently, Flink
can not support dynamically loading configuration. One quick solution is to use a static ip for EMR
master node[1].


Best,
Yang



Aaron Langford <[hidden email]> 于2020年1月22日周三 上午1:48写道:
Hello Flink Community,

I'm working on a HA setup of Flink 1.8.1 on AWS EMR and have some questions about how Flink interacts with Zookeeper when one of the servers in the quorum specified in flink-conf.yaml goes down and is replaced by a machine with a new IP address.

Currently, I configure high-availability.zookeeper.quorum to be the IP addresses of the 3 master nodes of the EMR cluster, as this is what AWS does to enable a highly available YARN setup.

EMR master nodes may go down entirely and need to be replaced by a machine with a different instance IP address. I will almost certainly need to perform a rolling configuration update to account for this. But will I need to restart flink for this to take effect? Is there a way to dynamically reload these configs when they change?

Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Replacing a server in Zookeeper Quorum

Aaron Langford
In reply to this post by Yang Wang
My apologies, I ended up resolving this through experimentation. AWS replaces master nodes with the same internal DNS names, so configurations need not be changed.

Aaron 


On Tue, Jan 21, 2020, 6:33 PM Yang Wang <[hidden email]> wrote:
Hi Aaron,

I think it is not the responsibility of Flink. Flink uses zookeeper curator to connect the zk server. If
multiple zk server are specified, it has an automatic retry mechanism.
However, your problem is ip address will change when the EMR instance restarts. Currently, Flink
can not support dynamically loading configuration. One quick solution is to use a static ip for EMR
master node[1].


Best,
Yang



Aaron Langford <[hidden email]> 于2020年1月22日周三 上午1:48写道:
Hello Flink Community,

I'm working on a HA setup of Flink 1.8.1 on AWS EMR and have some questions about how Flink interacts with Zookeeper when one of the servers in the quorum specified in flink-conf.yaml goes down and is replaced by a machine with a new IP address.

Currently, I configure high-availability.zookeeper.quorum to be the IP addresses of the 3 master nodes of the EMR cluster, as this is what AWS does to enable a highly available YARN setup.

EMR master nodes may go down entirely and need to be replaced by a machine with a different instance IP address. I will almost certainly need to perform a rolling configuration update to account for this. But will I need to restart flink for this to take effect? Is there a way to dynamically reload these configs when they change?

Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Replacing a server in Zookeeper Quorum

tison
Good to know :-)

Best,
tison.


Aaron Langford <[hidden email]> 于2020年1月22日周三 上午10:44写道:
My apologies, I ended up resolving this through experimentation. AWS replaces master nodes with the same internal DNS names, so configurations need not be changed.

Aaron 


On Tue, Jan 21, 2020, 6:33 PM Yang Wang <[hidden email]> wrote:
Hi Aaron,

I think it is not the responsibility of Flink. Flink uses zookeeper curator to connect the zk server. If
multiple zk server are specified, it has an automatic retry mechanism.
However, your problem is ip address will change when the EMR instance restarts. Currently, Flink
can not support dynamically loading configuration. One quick solution is to use a static ip for EMR
master node[1].


Best,
Yang



Aaron Langford <[hidden email]> 于2020年1月22日周三 上午1:48写道:
Hello Flink Community,

I'm working on a HA setup of Flink 1.8.1 on AWS EMR and have some questions about how Flink interacts with Zookeeper when one of the servers in the quorum specified in flink-conf.yaml goes down and is replaced by a machine with a new IP address.

Currently, I configure high-availability.zookeeper.quorum to be the IP addresses of the 3 master nodes of the EMR cluster, as this is what AWS does to enable a highly available YARN setup.

EMR master nodes may go down entirely and need to be replaced by a machine with a different instance IP address. I will almost certainly need to perform a rolling configuration update to account for this. But will I need to restart flink for this to take effect? Is there a way to dynamically reload these configs when they change?

Aaron