high-availability.jobmanager.port vs jobmanager.rpc.port

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

high-availability.jobmanager.port vs jobmanager.rpc.port

Elias Levy
I am wondering why HA mode there is a need for a separate config parameter to set the JM RPC port (high-availability.jobmanager.port) and why this parameter accepts a range, unlike jobmanager.rpc.port.


Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Nico Kruber
Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940


Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.

Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Stephan Ewen
Hi!

I think that can probably be simplified in the FLIP-6 case:

  - All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter when the container is started.
  (Parameter should be something like 'jobmanager.rpc.ports: 50000-51000')

  - An exception is the standalone non-HA case, because there is no service-discovery mechanism. That should probably be the a config key like 'standalone.jobmanager.rpc.port: 6123'

  - The client calls come via HTTP/REST and should have one specific port that may optionally be discovered/redirected via YARN or the dispatchers.

/cc Till for your thoughts

Best,
Stephan


On Mon, Sep 25, 2017 at 3:31 PM, Nico Kruber <[hidden email]> wrote:
Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940


Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.


Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Stephan Ewen
/cc Till for real this time ;-)

Hi!

I think that can probably be simplified in the FLIP-6 case:

  - All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter when the container is started.
  (Parameter should be something like 'jobmanager.rpc.ports: 50000-51000')

  - An exception is the standalone non-HA case, because there is no service-discovery mechanism. That should probably be the a config key like 'standalone.jobmanager.rpc.port: 6123'

  - The client calls come via HTTP/REST and should have one specific port that may optionally be discovered/redirected via YARN or the dispatchers.

/cc Till for your thoughts

Best,
Stephan


On Mon, Sep 25, 2017 at 3:31 PM, Nico Kruber <[hidden email]> wrote:
Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940


Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.



Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Till Rohrmann

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

However, if you look at it closely, then it is mainly a renaming of the existing configuration parameters: jobmanager.rpc.port -> standalone.jobmanager.rpc.port and high-availability.jobmanager.port -> jobmanager.rpc.ports
Cheers,
Till


On Mon, Sep 25, 2017 at 3:42 PM, Stephan Ewen <[hidden email]> wrote:
/cc Till for real this time ;-)

Hi!

I think that can probably be simplified in the FLIP-6 case:

  - All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter when the container is started.
  (Parameter should be something like 'jobmanager.rpc.ports: 50000-51000')

  - An exception is the standalone non-HA case, because there is no service-discovery mechanism. That should probably be the a config key like 'standalone.jobmanager.rpc.port: 6123'

  - The client calls come via HTTP/REST and should have one specific port that may optionally be discovered/redirected via YARN or the dispatchers.

/cc Till for your thoughts

Best,
Stephan


On Mon, Sep 25, 2017 at 3:31 PM, Nico Kruber <[hidden email]> wrote:
Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940


Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.




Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Elias Levy
Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Till Rohrmann
Because a single port could easily lead to clashes if there is another JobManager running on the same machine with the same port (e.g. due to standby JobManagers).

Cheers,
Till

On Sep 26, 2017 03:20, "Elias Levy" <[hidden email]> wrote:
Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Elias Levy
I presume then that the Job Managers and Task Managers are performing service discovery via Zookeeper in HA mode, rather than from the config file or the masters file.  Yes?

On Mon, Sep 25, 2017 at 11:14 PM, Till Rohrmann <[hidden email]> wrote:
Because a single port could easily lead to clashes if there is another JobManager running on the same machine with the same port (e.g. due to standby JobManagers).

Cheers,
Till

On Sep 26, 2017 03:20, "Elias Levy" <[hidden email]> wrote:
Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.


Reply | Threaded
Open this post in threaded view
|

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Till Rohrmann
Yes exactly.

On Tue, Sep 26, 2017 at 5:07 PM, Elias Levy <[hidden email]> wrote:
I presume then that the Job Managers and Task Managers are performing service discovery via Zookeeper in HA mode, rather than from the config file or the masters file.  Yes?

On Mon, Sep 25, 2017 at 11:14 PM, Till Rohrmann <[hidden email]> wrote:
Because a single port could easily lead to clashes if there is another JobManager running on the same machine with the same port (e.g. due to standby JobManagers).

Cheers,
Till

On Sep 26, 2017 03:20, "Elias Levy" <[hidden email]> wrote:
Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.