(DEPRECATED) Apache Flink User Mailing List archive.

JobManager web interface redirect strategy when running in HA

Classic

List

Threaded

4 messages Options

mrooding

JobManager web interface redirect strategy when running in HA

Hi

We're running 3 job managers in high availability cluster mode backed by
OpenStack/Openshift. We're currently exposing all 3 job managers using 3
different routes (flink-1.domain.tld, flink-2.domain.tld,
flink-3.domain.tld). When accessing the route for a job manager which isn't
the leader it automatically redirects the user to the host and port of the
leading job manager. From what I've seen in the source code the rpc address
and port are being used to redirect. Since the internal hostnames are not
accessible outside the cluster this obviously doesn't work.

The nicest solution would be a single route (flink.domain.tld) which would
correctly delegate requests to the leading job manager. The second best
solution would probably be the possibility to declare a public URL in the
flink configuration file.

I'd be more than happy to contribute to Flink and add support for this but
I'd love to hear your ideas about it.

Kind regards

Marc

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Chesnay Schepler

Re: JobManager web interface redirect strategy when running in HA

We intend to change the redirection behavior such that any jobmanager
(leading or not) can
accept requests, and communicates internally with the leader. In this
model you could setup
the flink.domain.tld to point to any jobmanager (or distribute requests
among them).

Would this work for you?

I believe this is targeted for 1.5.

On 31.10.2017 13:55, mrooding wrote:

> Hi
>
> We're running 3 job managers in high availability cluster mode backed by
> OpenStack/Openshift. We're currently exposing all 3 job managers using 3
> different routes (flink-1.domain.tld, flink-2.domain.tld,
> flink-3.domain.tld). When accessing the route for a job manager which isn't
> the leader it automatically redirects the user to the host and port of the
> leading job manager. From what I've seen in the source code the rpc address
> and port are being used to redirect. Since the internal hostnames are not
> accessible outside the cluster this obviously doesn't work.
>
> The nicest solution would be a single route (flink.domain.tld) which would
> correctly delegate requests to the leading job manager. The second best
> solution would probably be the possibility to declare a public URL in the
> flink configuration file.
>
> I'd be more than happy to contribute to Flink and add support for this but
> I'd love to hear your ideas about it.
>
> Kind regards
>
> Marc
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Jürgen Thomann

Re: JobManager web interface redirect strategy when running in HA

I think you can solve this already with a health check (health monitor
in OpenStack?).
I'm currently using GET requests to / and if they don't reply with a
200 code the LB
will not use them. Only the Leader answers with a 200 code whereas the
others send
a redirect with 30x code which should ensure that the requests go always
the leader.

On 01.11.2017 12:34, Chesnay Schepler wrote:

> We intend to change the redirection behavior such that any jobmanager
> (leading or not) can
> accept requests, and communicates internally with the leader. In this
> model you could setup
> the flink.domain.tld to point to any jobmanager (or distribute
> requests among them).
>
> Would this work for you?
>
> I believe this is targeted for 1.5.
>
> On 31.10.2017 13:55, mrooding wrote:
>> Hi
>>
>> We're running 3 job managers in high availability cluster mode backed by
>> OpenStack/Openshift. We're currently exposing all 3 job managers using 3
>> different routes (flink-1.domain.tld, flink-2.domain.tld,
>> flink-3.domain.tld). When accessing the route for a job manager which
>> isn't
>> the leader it automatically redirects the user to the host and port
>> of the
>> leading job manager. From what I've seen in the source code the rpc
>> address
>> and port are being used to redirect. Since the internal hostnames are
>> not
>> accessible outside the cluster this obviously doesn't work.
>>
>> The nicest solution would be a single route (flink.domain.tld) which
>> would
>> correctly delegate requests to the leading job manager. The second best
>> solution would probably be the possibility to declare a public URL in
>> the
>> flink configuration file.
>>
>> I'd be more than happy to contribute to Flink and add support for
>> this but
>> I'd love to hear your ideas about it.
>>
>> Kind regards
>>
>> Marc
>>
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>

--
Jürgen Thomann
Software Developer

InnoGames GmbH
Friesenstraße 13 - 20097 Hamburg - Germany
Tel +49 40 7889335-0

Managing Directors: Hendrik Klindworth, Michael Zillmer
VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973

http://www.innogames.com – [hidden email]

mrooding

Re: JobManager web interface redirect strategy when running in HA

Chesnay, your solution is definitely the best approach. I was already
wondering why the decision was made to only support the UI through the
leading job manager only.

Jürgen, I don't think that your solution will work in our setup. We're
currently running 3 services, one for each job manager. We need a service
per job manager because they obviously need to be able to talk to each
other. In the latest version of OpenShift you can use a StatefulSet to
handle these situations but unfortunately, StatefulSets seem to rely on each
node receiving its own persistent volume claim whereas Flink seems to share
1 persistent volume claim for all nodes.

I've been going through the Kubernetes documentation about Load Balancers
but I'm unable to find a solution which handles both cases:
- each node being available through a cluster name (e.g.
flink-jobmanager-1.env.svc.cluster.local)
- exposing 1 URL which uses the load balancing solution proposed by you

Worst case is that we would have to wait for Flink 1.5 and keep using 3
distinct URLs. It's not ideal but there are also bigger fish to tackle.

Marc

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/