Hi
We're running 3 job managers in high availability cluster mode backed by OpenStack/Openshift. We're currently exposing all 3 job managers using 3 different routes (flink-1.domain.tld, flink-2.domain.tld, flink-3.domain.tld). When accessing the route for a job manager which isn't the leader it automatically redirects the user to the host and port of the leading job manager. From what I've seen in the source code the rpc address and port are being used to redirect. Since the internal hostnames are not accessible outside the cluster this obviously doesn't work. The nicest solution would be a single route (flink.domain.tld) which would correctly delegate requests to the leading job manager. The second best solution would probably be the possibility to declare a public URL in the flink configuration file. I'd be more than happy to contribute to Flink and add support for this but I'd love to hear your ideas about it. Kind regards Marc -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
We intend to change the redirection behavior such that any jobmanager
(leading or not) can accept requests, and communicates internally with the leader. In this model you could setup the flink.domain.tld to point to any jobmanager (or distribute requests among them). Would this work for you? I believe this is targeted for 1.5. On 31.10.2017 13:55, mrooding wrote: > Hi > > We're running 3 job managers in high availability cluster mode backed by > OpenStack/Openshift. We're currently exposing all 3 job managers using 3 > different routes (flink-1.domain.tld, flink-2.domain.tld, > flink-3.domain.tld). When accessing the route for a job manager which isn't > the leader it automatically redirects the user to the host and port of the > leading job manager. From what I've seen in the source code the rpc address > and port are being used to redirect. Since the internal hostnames are not > accessible outside the cluster this obviously doesn't work. > > The nicest solution would be a single route (flink.domain.tld) which would > correctly delegate requests to the leading job manager. The second best > solution would probably be the possibility to declare a public URL in the > flink configuration file. > > I'd be more than happy to contribute to Flink and add support for this but > I'd love to hear your ideas about it. > > Kind regards > > Marc > > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
I think you can solve this already with a health check (health monitor
in OpenStack?). I'm currently using GET requests to / and if they don't reply with a 200 code the LB will not use them. Only the Leader answers with a 200 code whereas the others send a redirect with 30x code which should ensure that the requests go always the leader. On 01.11.2017 12:34, Chesnay Schepler wrote: > We intend to change the redirection behavior such that any jobmanager > (leading or not) can > accept requests, and communicates internally with the leader. In this > model you could setup > the flink.domain.tld to point to any jobmanager (or distribute > requests among them). > > Would this work for you? > > I believe this is targeted for 1.5. > > On 31.10.2017 13:55, mrooding wrote: >> Hi >> >> We're running 3 job managers in high availability cluster mode backed by >> OpenStack/Openshift. We're currently exposing all 3 job managers using 3 >> different routes (flink-1.domain.tld, flink-2.domain.tld, >> flink-3.domain.tld). When accessing the route for a job manager which >> isn't >> the leader it automatically redirects the user to the host and port >> of the >> leading job manager. From what I've seen in the source code the rpc >> address >> and port are being used to redirect. Since the internal hostnames are >> not >> accessible outside the cluster this obviously doesn't work. >> >> The nicest solution would be a single route (flink.domain.tld) which >> would >> correctly delegate requests to the leading job manager. The second best >> solution would probably be the possibility to declare a public URL in >> the >> flink configuration file. >> >> I'd be more than happy to contribute to Flink and add support for >> this but >> I'd love to hear your ideas about it. >> >> Kind regards >> >> Marc >> >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> > -- Jürgen Thomann Software Developer InnoGames GmbH Friesenstraße 13 - 20097 Hamburg - Germany Tel +49 40 7889335-0 Managing Directors: Hendrik Klindworth, Michael Zillmer VAT-ID: DE264068907 Amtsgericht Hamburg, HRB 108973 http://www.innogames.com – [hidden email] |
Chesnay, your solution is definitely the best approach. I was already
wondering why the decision was made to only support the UI through the leading job manager only. Jürgen, I don't think that your solution will work in our setup. We're currently running 3 services, one for each job manager. We need a service per job manager because they obviously need to be able to talk to each other. In the latest version of OpenShift you can use a StatefulSet to handle these situations but unfortunately, StatefulSets seem to rely on each node receiving its own persistent volume claim whereas Flink seems to share 1 persistent volume claim for all nodes. I've been going through the Kubernetes documentation about Load Balancers but I'm unable to find a solution which handles both cases: - each node being available through a cluster name (e.g. flink-jobmanager-1.env.svc.cluster.local) - exposing 1 URL which uses the load balancing solution proposed by you Worst case is that we would have to wait for Flink 1.5 and keep using 3 distinct URLs. It's not ideal but there are also bigger fish to tackle. Marc -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |