JM fails connecting to TM Metrics service on AWS ECS

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

JM fails connecting to TM Metrics service on AWS ECS

Rafi Aroch
Hi,

I have a Flink 1.9.0 cluster deployed on AWS ECS. Cluster is running, but metrics are not showing in the UI. 

For other services (RPC / Data) it works because the connection is initiated from the TM to the JM through a load-balancer. But it does not work for metrics where JM tries to initiate a connection with the TMs.

Currently, Flink uses taskmanager.host configuration as both 'bind address' and 'advertised address'. When TM starts, it binds to the internal Docker IP which is not accessible from the JM. 

Also, the TM metrics.internal.query-service.port is set to a specific port which is dynamically bind to a random ECS host port.

It seems that I need a separate setting for bind-address/port vs advertised-address/port.

I saw there were several discussions on this issue also for Kubernetes: https://issues.apache.org/jira/browse/FLINK-11127
There was also an attempt to solve this by using Akka configurations here: https://hub.docker.com/r/lzaugg/flink-taskmanager/

Can someone suggest a solution for this issue on AWS ECS?

Would appreciate your help.

Thanks,
Rafi