Re: Jira issue Flink-11127

Posted by Boris Lublinsky on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Jira-issue-Flink-11127-tp26180p26249.html

Also, The suggested workaround does not quite work.
2019-02-20 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [<a href="akka.tcp://flink-metrics@flink-taskmanager-1:6170" class="">akka.tcp://flink-metrics@flink-taskmanager-1:6170] has failed, address is now gated for [50] ms. Reason: [Association failed with [<a href="akka.tcp://flink-metrics@flink-taskmanager-1:6170" class="">akka.tcp://flink-metrics@flink-taskmanager-1:6170]] Caused by: [flink-taskmanager-1: No address associated with hostname]
2019-02-20 15:27:48,750 ERROR org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler - Caught exception

I think the problem is that its trying to connect to flink-task-manager-1

Using busybody to experiment with nslookup, I can see
/ # nslookup flink-taskmanager-1.flink-taskmanager
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

Name:      flink-taskmanager-1.flink-taskmanager
Address 1: 10.131.2.136 flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local
/ # nslookup flink-taskmanager-1
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

nslookup: can't resolve 'flink-taskmanager-1'
/ # nslookup flink-taskmanager-0.flink-taskmanager
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

Name:      flink-taskmanager-0.flink-taskmanager
Address 1: 10.131.0.111 flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local
/ # nslookup flink-taskmanager-0
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

nslookup: can't resolve 'flink-taskmanager-0'
/ # 

So the name should be postfixed with the service name. How do I force it? I suspect I am missing config parameter

 
Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

On Feb 19, 2019, at 4:33 AM, Konstantin Knauf <[hidden email]> wrote:

Hi Boris,

the solution is actually simpler than it sounds from the ticket. The only thing you need to do is to set the "taskmanager.host" to the Pod's IP address in the Flink configuration. The easiest way to do this is to pass this config dynamically via a command-line parameter. 

The Deployment spec could looks something like this:
containers:
- name: taskmanager
[...]
args:
- "taskmanager.sh"
- "start-foreground"
- "-Dtaskmanager.host=$(K8S_POD_IP)"
[...]
  env:
- name: K8S_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP

Hope this helps and let me know if this works. 

Best, 

Konstantin

On Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky <[hidden email]> wrote:
Apparently there is a workaround for it.
Is it possible provide the complete helm chart for it.
Bits and pieces are in the ticket, but it would be nice to see the full chart

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/



--
Konstantin Knauf | Solutions Architect
+49 160 91394525


Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen