Problem with metrics inside Kubernetes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with metrics inside Kubernetes

Steven Nelson

I have been working with Flink under Kubernetes recently and I have run into some problems with metrics. I think I have it figured out though. It appears that it's trying to use hostname resolution for the jobmanagers. This causes this error:

Association with remote system [akka.tcp://flink@flink-taskmanager-7dffcf7975-vb2pc:42028] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-taskmanager-7dffcf7975-vb2pc:42028]] Caused by: [flink-taskmanager-7dffcf7975-vb2pc]

I noticed that if I put hosts file entries on the jobmanager for each of the task managers then everything started working. Is there a way to specify the hostname of taskmanager like you can with the jobmanager?

-Steve
Reply | Threaded
Open this post in threaded view
|

Re: Problem with metrics inside Kubernetes

Derek VerLee

See my reply I just posted to the thread "Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment".

On 1/2/19 11:19 AM, Steven Nelson wrote:

I have been working with Flink under Kubernetes recently and I have run into some problems with metrics. I think I have it figured out though. It appears that it's trying to use hostname resolution for the jobmanagers. This causes this error:

Association with remote system [akka.tcp://flink@flink-taskmanager-7dffcf7975-vb2pc:42028] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-taskmanager-7dffcf7975-vb2pc:42028]] Caused by: [flink-taskmanager-7dffcf7975-vb2pc]

I noticed that if I put hosts file entries on the jobmanager for each of the task managers then everything started working. Is there a way to specify the hostname of taskmanager like you can with the jobmanager?

-Steve