(DEPRECATED) Apache Flink User Mailing List archive.

high-availability.jobmanager.port vs jobmanager.rpc.port

Classic

List

Threaded

9 messages Options

Elias Levy

high-availability.jobmanager.port vs jobmanager.rpc.port

I am wondering why HA mode there is a need for a separate config parameter to set the JM RPC port (high-availability.jobmanager.port) and why this parameter accepts a range, unlike jobmanager.rpc.port.

Nico Kruber

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940

Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.

Stephan Ewen

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Hi!

I think that can probably be simplified in the FLIP-6 case:

- All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter when the container is started.

(Parameter should be something like 'jobmanager.rpc.ports: 50000-51000')

- An exception is the standalone non-HA case, because there is no service-discovery mechanism. That should probably be the a config key like 'standalone.jobmanager.rpc.port: 6123'

- The client calls come via HTTP/REST and should have one specific port that may optionally be discovered/redirected via YARN or the dispatchers.

/cc Till for your thoughts

Best,

Stephan

On Mon, Sep 25, 2017 at 3:31 PM, Nico Kruber <[hidden email]> wrote:

Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940

Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.

Stephan Ewen

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

/cc Till for real this time ;-)

Hi!

I think that can probably be simplified in the FLIP-6 case:

(Parameter should be something like 'jobmanager.rpc.ports: 50000-51000')

- An exception is the standalone non-HA case, because there is no service-discovery mechanism. That should probably be the a config key like 'standalone.jobmanager.rpc.port: 6123'

- The client calls come via HTTP/REST and should have one specific port that may optionally be discovered/redirected via YARN or the dispatchers.

/cc Till for your thoughts

Best,

Stephan

On Mon, Sep 25, 2017 at 3:31 PM, Nico Kruber <[hidden email]> wrote:

Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940

Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.

Till Rohrmann

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

However, if you look at it closely, then it is mainly a renaming of the existing configuration parameters: jobmanager.rpc.port -> standalone.jobmanager.rpc.port and high-availability.jobmanager.port -> jobmanager.rpc.ports
Cheers,
Till

On Mon, Sep 25, 2017 at 3:42 PM, Stephan Ewen <[hidden email]> wrote:

/cc Till for real this time ;-)

Hi!

I think that can probably be simplified in the FLIP-6 case:

- All RPC is only between JM and TM and the port should be completely random (optionally within a range). TM and JM discover each other via HA (ZK) or the TM gets the JM RPC port as a parameter when the container is started.
(Parameter should be something like 'jobmanager.rpc.ports: 50000-51000')

- An exception is the standalone non-HA case, because there is no service-discovery mechanism. That should probably be the a config key like 'standalone.jobmanager.rpc.port: 6123'

- The client calls come via HTTP/REST and should have one specific port that may optionally be discovered/redirected via YARN or the dispatchers.

/cc Till for your thoughts

Best,
Stephan

On Mon, Sep 25, 2017 at 3:31 PM, Nico Kruber <[hidden email]> wrote:
Hi Elias,
indeed that looks strange but was introduced with FLINK-3172 [1] with an
argument about using the same configuration key (as opposed to having two
different keys as mentioned) starting at
https://issues.apache.org/jira/browse/FLINK-3172?
focusedCommentId=15091940#comment-15091940

Nico

[1] https://issues.apache.org/jira/browse/FLINK-3172

On Sunday, 24 September 2017 03:04:51 CEST Elias Levy wrote:
> I am wondering why HA mode there is a need for a separate config parameter
> to set the JM RPC port (high-availability.jobmanager.port) and why this
> parameter accepts a range, unlike jobmanager.rpc.port.

Elias Levy

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:

Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

Till Rohrmann

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Because a single port could easily lead to clashes if there is another JobManager running on the same machine with the same port (e.g. due to standby JobManagers).

Cheers,

Till

On Sep 26, 2017 03:20, "Elias Levy" <[hidden email]> wrote:

Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:
Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

Elias Levy

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

I presume then that the Job Managers and Task Managers are performing service discovery via Zookeeper in HA mode, rather than from the config file or the masters file. Yes?

On Mon, Sep 25, 2017 at 11:14 PM, Till Rohrmann <[hidden email]> wrote:

Because a single port could easily lead to clashes if there is another JobManager running on the same machine with the same port (e.g. due to standby JobManagers).

Cheers,
Till

On Sep 26, 2017 03:20, "Elias Levy" <[hidden email]> wrote:
Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:
Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.

Till Rohrmann

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

Yes exactly.

On Tue, Sep 26, 2017 at 5:07 PM, Elias Levy <[hidden email]> wrote:

I presume then that the Job Managers and Task Managers are performing service discovery via Zookeeper in HA mode, rather than from the config file or the masters file. Yes?

On Mon, Sep 25, 2017 at 11:14 PM, Till Rohrmann <[hidden email]> wrote:
Because a single port could easily lead to clashes if there is another JobManager running on the same machine with the same port (e.g. due to standby JobManagers).

Cheers,
Till

On Sep 26, 2017 03:20, "Elias Levy" <[hidden email]> wrote:
Why a range instead of just a single port in HA mode?

On Mon, Sep 25, 2017 at 1:49 PM, Till Rohrmann <[hidden email]> wrote:
Yes, with Flip-6 it will most likely look like how Stephan described it. We need the explicit port in standalone mode so that TMs can connect to the JM. In the other deployment scenarios, the port can be randomly picked unless you want to specify a port range, e.g. for firewall configuration purposes.