Flink 1.7.1 Inaccessible

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.7.1 Inaccessible

Seye Jin
I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. 
When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."
Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).
Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.7.1 Inaccessible

Till Rohrmann
Hi Seye,

usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong. 

What you could also do is to check the ZooKeeper znode which represents the cluster id (if you are using Yarn it should be something like /flink/application_...). There you could check the contents of the leader znode of the web ui (leader/rest_server_lock). It should contain the address of the current leader if there is one.

Cheers,
Till

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <[hidden email]> wrote:
I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. 
When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."
Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).
Thanks

Reply | Threaded
Open this post in threaded view
|

RE: EXT :Re: Flink 1.7.1 Inaccessible

Martin, Nick-2

Seye, are you running Flink and Zookeeper in Docker? I’ve had problems with Jobmanagers not resolving the hostnames for Zookeeper when starting a stack on Docker.

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Monday, March 04, 2019 7:02 AM
To: Seye Jin <[hidden email]>
Cc: user <[hidden email]>
Subject: EXT :Re: Flink 1.7.1 Inaccessible

 

Hi Seye,

 

usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong. 

 

What you could also do is to check the ZooKeeper znode which represents the cluster id (if you are using Yarn it should be something like /flink/application_...). There you could check the contents of the leader znode of the web ui (leader/rest_server_lock). It should contain the address of the current leader if there is one.

 

Cheers,

Till

 

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <[hidden email]> wrote:

I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. 

When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."

Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).

Thanks

 

 


Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************



Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************
Reply | Threaded
Open this post in threaded view
|

Re: EXT :Re: Flink 1.7.1 Inaccessible

Seye Jin

Hi till, there were no warn or error log messages. We have been using Flink for a long time now and never experienced this issue(we just migrated to 1.7 from 1.4 though).It was a critical app and after multiple tries to try and resolve, we updated the *high-availabilty.cluster-id* and attached the TMs to new JM(even though we sadly lost state)

@nick we are indeed running Flink and zookeeper in docker and we verified it could resolve hostname, plus it got a new leader id, it even acknowledged registering the jobs running on the cluster(even though checkpoints were not getting triggered)

We are keeping a close eye on this issue and trying to replicate and sift through kibana logs and will post here if we find anything.

P.S: it kind of looks similar to this that happened a while back (http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<[hidden email]>
)


On Mon, Mar 4, 2019, 12:38 PM Martin, Nick <[hidden email]> wrote:

Seye, are you running Flink and Zookeeper in Docker? I’ve had problems with Jobmanagers not resolving the hostnames for Zookeeper when starting a stack on Docker.

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Monday, March 04, 2019 7:02 AM
To: Seye Jin <[hidden email]>
Cc: user <[hidden email]>
Subject: EXT :Re: Flink 1.7.1 Inaccessible

 

Hi Seye,

 

usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong. 

 

What you could also do is to check the ZooKeeper znode which represents the cluster id (if you are using Yarn it should be something like /flink/application_...). There you could check the contents of the leader znode of the web ui (leader/rest_server_lock). It should contain the address of the current leader if there is one.

 

Cheers,

Till

 

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <[hidden email]> wrote:

I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. 

When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."

Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).

Thanks

 

 


Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************



Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************
Reply | Threaded
Open this post in threaded view
|

Re: EXT :Re: Flink 1.7.1 Inaccessible

Till Rohrmann
Hmm this is strange. Retrieving more information from the logs would be helpful to better understand the problem.

The link to the related discussion does not work. Maybe you could repost it.

Cheers,
Till

On Wed, Mar 6, 2019 at 4:32 AM Seye Jin <[hidden email]> wrote:

Hi till, there were no warn or error log messages. We have been using Flink for a long time now and never experienced this issue(we just migrated to 1.7 from 1.4 though).It was a critical app and after multiple tries to try and resolve, we updated the *high-availabilty.cluster-id* and attached the TMs to new JM(even though we sadly lost state)

@nick we are indeed running Flink and zookeeper in docker and we verified it could resolve hostname, plus it got a new leader id, it even acknowledged registering the jobs running on the cluster(even though checkpoints were not getting triggered)

We are keeping a close eye on this issue and trying to replicate and sift through kibana logs and will post here if we find anything.

P.S: it kind of looks similar to this that happened a while back (http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<[hidden email]>
)


On Mon, Mar 4, 2019, 12:38 PM Martin, Nick <[hidden email]> wrote:

Seye, are you running Flink and Zookeeper in Docker? I’ve had problems with Jobmanagers not resolving the hostnames for Zookeeper when starting a stack on Docker.

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Monday, March 04, 2019 7:02 AM
To: Seye Jin <[hidden email]>
Cc: user <[hidden email]>
Subject: EXT :Re: Flink 1.7.1 Inaccessible

 

Hi Seye,

 

usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong. 

 

What you could also do is to check the ZooKeeper znode which represents the cluster id (if you are using Yarn it should be something like /flink/application_...). There you could check the contents of the leader znode of the web ui (leader/rest_server_lock). It should contain the address of the current leader if there is one.

 

Cheers,

Till

 

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <[hidden email]> wrote:

I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. 

When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."

Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).

Thanks

 

 


Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************



Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************
Reply | Threaded
Open this post in threaded view
|

Re: EXT :Re: Flink 1.7.1 Inaccessible

Seye Jin
You will have to copy and the link in it's entirety,Gmail not recognizing correctly
http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<[hidden email]>

On Wed, Mar 6, 2019, 5:26 AM Till Rohrmann <[hidden email]> wrote:
Hmm this is strange. Retrieving more information from the logs would be helpful to better understand the problem.

The link to the related discussion does not work. Maybe you could repost it.

Cheers,
Till

On Wed, Mar 6, 2019 at 4:32 AM Seye Jin <[hidden email]> wrote:

Hi till, there were no warn or error log messages. We have been using Flink for a long time now and never experienced this issue(we just migrated to 1.7 from 1.4 though).It was a critical app and after multiple tries to try and resolve, we updated the *high-availabilty.cluster-id* and attached the TMs to new JM(even though we sadly lost state)

@nick we are indeed running Flink and zookeeper in docker and we verified it could resolve hostname, plus it got a new leader id, it even acknowledged registering the jobs running on the cluster(even though checkpoints were not getting triggered)

We are keeping a close eye on this issue and trying to replicate and sift through kibana logs and will post here if we find anything.

P.S: it kind of looks similar to this that happened a while back (http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/<[hidden email]>
)


On Mon, Mar 4, 2019, 12:38 PM Martin, Nick <[hidden email]> wrote:

Seye, are you running Flink and Zookeeper in Docker? I’ve had problems with Jobmanagers not resolving the hostnames for Zookeeper when starting a stack on Docker.

 

From: Till Rohrmann [mailto:[hidden email]]
Sent: Monday, March 04, 2019 7:02 AM
To: Seye Jin <[hidden email]>
Cc: user <[hidden email]>
Subject: EXT :Re: Flink 1.7.1 Inaccessible

 

Hi Seye,

 

usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong. 

 

What you could also do is to check the ZooKeeper znode which represents the cluster id (if you are using Yarn it should be something like /flink/application_...). There you could check the contents of the leader znode of the web ui (leader/rest_server_lock). It should contain the address of the current leader if there is one.

 

Cheers,

Till

 

On Sat, Mar 2, 2019 at 5:29 AM Seye Jin <[hidden email]> wrote:

I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. 

When I view logs during JM restart I see no errors, it even says "jobmanager was granted leadership with ..."

Any hints to try and remediate this issue will be much appreciated. I have multiple stateful applications running so I cannot start a new cluster(since I am unable to do a savepoint also).

Thanks

 

 


Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************



Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. This communication may also contain data subject to U.S. export laws. If so, data subject to the International Traffic in Arms Regulation cannot be disseminated, distributed, transferred, or copied, whether incorporated or in its original form, to foreign nationals residing in the U.S. or abroad, absent the express prior approval of the U.S. Department of State. Data subject to the Export Administration Act may not be disseminated, distributed, transferred or copied contrary to U. S. Department of Commerce regulations. If you have received this communication in error, please notify the sender by reply e-mail and destroy the e-mail message and any physical copies made of the communication.
 Thank you. 
*********************