Flink on Kubernetes - Hostname resolution between job/tasks-managers

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on Kubernetes - Hostname resolution between job/tasks-managers

bastien dine
Hello,
I am trying to install Flink on Kube, it's almost working.. 
I am using the kube files on flink 1.7.1 doc

My cluster is starting well, my 2 tasksmanagers are registering successfully to job manager
On webUI, i see them :
akka.tcp://flink@dev-flink-taskmanager-3717639837-gvwh4:37057/user/taskmanager_0

I can submit a job too..
But when I am going in job detail, or try to load the logs.. I have nothing.. and log on jobmanager give me plenty of error like :

2019-01-15 14:12:40.111 [flink-metrics-96] WARN akka.remote.ReliableDeliverySupervisor flink-metrics-akka.remote.default-remote-dispatcher-113 - Association with remote system [akka.tcp://flink-metrics@dev-flink-taskmanager-3717639837-gvwh4:40508] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@dev-flink-taskmanager-3717639837-gvwh4:40508]] Caused by: [dev-flink-taskmanager-3717639837-gvwh4: Name does not resolve]

-> Name does not resolve.. 
So trying to ping on the pod hostname and it's not working
Thus, ping on the pod's IP is working

So, my question is :
- Can we force usage of IPv4 over hostname resolution ? (will be better for perf also)
- If no, do I need to had a service or something to make it work ?

Best Regards,
Bastien

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io
Reply | Threaded
Open this post in threaded view
|

Re: Flink on Kubernetes - Hostname resolution between job/tasks-managers

bastien dine
Nevermind.. 
Problem already discussed in thread : 
Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment"


------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le mar. 15 janv. 2019 à 15:16, bastien dine <[hidden email]> a écrit :
Hello,
I am trying to install Flink on Kube, it's almost working.. 
I am using the kube files on flink 1.7.1 doc

My cluster is starting well, my 2 tasksmanagers are registering successfully to job manager
On webUI, i see them :
akka.tcp://flink@dev-flink-taskmanager-3717639837-gvwh4:37057/user/taskmanager_0

I can submit a job too..
But when I am going in job detail, or try to load the logs.. I have nothing.. and log on jobmanager give me plenty of error like :

2019-01-15 14:12:40.111 [flink-metrics-96] WARN akka.remote.ReliableDeliverySupervisor flink-metrics-akka.remote.default-remote-dispatcher-113 - Association with remote system [akka.tcp://flink-metrics@dev-flink-taskmanager-3717639837-gvwh4:40508] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@dev-flink-taskmanager-3717639837-gvwh4:40508]] Caused by: [dev-flink-taskmanager-3717639837-gvwh4: Name does not resolve]

-> Name does not resolve.. 
So trying to ping on the pod hostname and it's not working
Thus, ping on the pod's IP is working

So, my question is :
- Can we force usage of IPv4 over hostname resolution ? (will be better for perf also)
- If no, do I need to had a service or something to make it work ?

Best Regards,
Bastien

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io