Understanding Job Manager Web UI in HA Mode

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding Job Manager Web UI in HA Mode

chiggi_dev
Hi,

We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager is the leader or task managers are registered to. However, from the logs I can see that Task Manager is registered with one Job Manager and if it's unavailable, Task Manager can switch to standby instance.

Having little to no experience on HA, I wanted to know if this is the expected behavior. I was assuming that only the leader Web UI would be accessible?

Thanks,
Chirag
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Job Manager Web UI in HA Mode

Till Rohrmann
Hi Chirag,

when starting standby JobManagers, then Flink will already start a web server for each process for serving REST requests. These servers will, however, not necessarily ask the JobManager they have been started with but always forward requests to the current leading JobManager. That way all web UIs are responsive but they all query the current leader. So for querying information you don't need to know which process is currently the leader.

One thing to add is that when uploading jars for the web submission, only the web server to which you uploaded the jar will see it.

Cheers,
Till

On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan <[hidden email]> wrote:
Hi,

We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager is the leader or task managers are registered to. However, from the logs I can see that Task Manager is registered with one Job Manager and if it's unavailable, Task Manager can switch to standby instance.

Having little to no experience on HA, I wanted to know if this is the expected behavior. I was assuming that only the leader Web UI would be accessible?

Thanks,
Chirag
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Job Manager Web UI in HA Mode

chiggi_dev
Thanks Till, that sounds fantastic. 

Is there any need for all Job Managers to see the jar after a job is running? 

I plan to sync the leader address from the config map and might always end up at the leader.

Thanks
Chirag

On Monday, 15 February, 2021, 03:16:50 pm IST, Till Rohrmann <[hidden email]> wrote:


Hi Chirag,

when starting standby JobManagers, then Flink will already start a web server for each process for serving REST requests. These servers will, however, not necessarily ask the JobManager they have been started with but always forward requests to the current leading JobManager. That way all web UIs are responsive but they all query the current leader. So for querying information you don't need to know which process is currently the leader.

One thing to add is that when uploading jars for the web submission, only the web server to which you uploaded the jar will see it.

Cheers,
Till

On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan <[hidden email]> wrote:
Hi,

We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager is the leader or task managers are registered to. However, from the logs I can see that Task Manager is registered with one Job Manager and if it's unavailable, Task Manager can switch to standby instance.

Having little to no experience on HA, I wanted to know if this is the expected behavior. I was assuming that only the leader Web UI would be accessible?

Thanks,
Chirag
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Job Manager Web UI in HA Mode

Till Rohrmann
No, there is no need after the job has been submitted. It's only that the web ui based submission is a two step process where you 1) upload the jar and 2) submit it. If you should access between 1) and 2) a different rest server, then the new rest server won't know about the uploaded jar.

Cheers,
Till

On Mon, Feb 15, 2021 at 11:41 AM Chirag Dewan <[hidden email]> wrote:
Thanks Till, that sounds fantastic. 

Is there any need for all Job Managers to see the jar after a job is running? 

I plan to sync the leader address from the config map and might always end up at the leader.

Thanks
Chirag

On Monday, 15 February, 2021, 03:16:50 pm IST, Till Rohrmann <[hidden email]> wrote:


Hi Chirag,

when starting standby JobManagers, then Flink will already start a web server for each process for serving REST requests. These servers will, however, not necessarily ask the JobManager they have been started with but always forward requests to the current leading JobManager. That way all web UIs are responsive but they all query the current leader. So for querying information you don't need to know which process is currently the leader.

One thing to add is that when uploading jars for the web submission, only the web server to which you uploaded the jar will see it.

Cheers,
Till

On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan <[hidden email]> wrote:
Hi,

We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager is the leader or task managers are registered to. However, from the logs I can see that Task Manager is registered with one Job Manager and if it's unavailable, Task Manager can switch to standby instance.

Having little to no experience on HA, I wanted to know if this is the expected behavior. I was assuming that only the leader Web UI would be accessible?

Thanks,
Chirag
Reply | Threaded
Open this post in threaded view
|

Re: Understanding Job Manager Web UI in HA Mode

Yang Wang
I think you could also configure the same Persistent Volume for all the JobManagers and mount it to /path/of/job-jars in Pod.
After that, set the config option "web.upload.dir: /path/of/job-jars". This will make the web submission works for multiple JobManagers.

Best,
Yang

Till Rohrmann <[hidden email]> 于2021年2月16日周二 上午12:24写道:
No, there is no need after the job has been submitted. It's only that the web ui based submission is a two step process where you 1) upload the jar and 2) submit it. If you should access between 1) and 2) a different rest server, then the new rest server won't know about the uploaded jar.

Cheers,
Till

On Mon, Feb 15, 2021 at 11:41 AM Chirag Dewan <[hidden email]> wrote:
Thanks Till, that sounds fantastic. 

Is there any need for all Job Managers to see the jar after a job is running? 

I plan to sync the leader address from the config map and might always end up at the leader.

Thanks
Chirag

On Monday, 15 February, 2021, 03:16:50 pm IST, Till Rohrmann <[hidden email]> wrote:


Hi Chirag,

when starting standby JobManagers, then Flink will already start a web server for each process for serving REST requests. These servers will, however, not necessarily ask the JobManager they have been started with but always forward requests to the current leading JobManager. That way all web UIs are responsive but they all query the current leader. So for querying information you don't need to know which process is currently the leader.

One thing to add is that when uploading jars for the web submission, only the web server to which you uploaded the jar will see it.

Cheers,
Till

On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan <[hidden email]> wrote:
Hi,

We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager is the leader or task managers are registered to. However, from the logs I can see that Task Manager is registered with one Job Manager and if it's unavailable, Task Manager can switch to standby instance.

Having little to no experience on HA, I wanted to know if this is the expected behavior. I was assuming that only the leader Web UI would be accessible?

Thanks,
Chirag