Hello, After upgrading the docker image for flink to 1.11.1 from 1.9 the hostname of the taskmanagers reported to our metrics show as IPs (e.g. 10.0.23.101) instead of hostnames. In the docker compose file we specify the hostname as such: hostname: "taskmanager-{{ '{{' }}.Node.Hostname{{ '}}' }}" Is there another way of achieving this? Regards ,Nikola Hrusov |
Hi Nikola, I'm not entirely sure about how this happened. Would need some more information to investigate, such as the complete configurations for taskmanagers in your docker compose file, and the taskmanager logs. One quick thing you may try is to explicitly set the configuration option `taskmanager.host` for your task managers, see if that is reflected in the metrics. Thank you~ Xintong Song On Wed, Aug 12, 2020 at 3:06 PM Nikola Hrusov <[hidden email]> wrote:
|
Hi Xintong, I have tried using the configuration taskmanager.host, but that actually makes it even worse. I have made a simple setup with docker compose to reproduce/explain it easier. You can find the compose files here: https://github.com/nikobearrr/flink-hostname-metrics I have made 2 identical compose files which can start a flink jobmanager and taskmanager together with graphite (for metrics). One is called docker-compose.yml and the second one docker-compose_with_hostname.yml The only difference between those two is line #29 which is the taskmanager.host variable. They both expose port 8081 for flink cluster UI and port 8082 for graphite UI. Running the setup without the taskmanager.host When you run the compose without the taskmanager.host variable the cluster starts just fine and the taskmanager registers. Running a job on that cluster would be just fine. The issue is that if you check the metrics in the Graphite UI instead of the hostname it will show the IP (in this case 172.20.0.3). That was not the case with prior 1.11 version of flink. Running the setup with the taskmanager.host Once I run the compose which includes the taskmanager.host variable I can see the cluster UI and it starts up fine. Also the metrics come correctly: However, now I found something wrong. The first thing is that when you go to http://localhost:8081/#/task-manager/70704dc334ac8007925409c575e42d7d/metrics where 70704dc334ac8007925409c575e42d7d is the GUID of the taskmanager I start getting those logs in my console: jobmanager_1 | 2020-10-27 15:59:44,366 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink-metrics@taskmanager-node01:42269] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@taskmanager-node01:42269]] Caused by: [java.net.UnknownHostException: taskmanager-node01: Name or service not known] Also the metrics for the taskmanager do not show as shown on the picture above. If you do not use "taskmanager.host" then metrics show and there are no such WARN logs for UnknownHostException. More importantly, we also see issues with this when we run batch jobs for what seems the same issue. The jobs fail on submission. This only happens when we explicitly set "taskmanager.host" variable from taskmanager: 2020-10-27T17:11:55.646Z [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService - Add job 6b6f5ee3a1ab7556fd0db64de0f7cb1d for job leader monitoring. from jobmanager: 2020-10-27T17:11:55.670Z [flink-akka.actor.default-dispatcher-36] INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) Both the taskmanager and jobmanager agree on 1 thing: "taskmanager-node01: No address associated with hostname". Setting the hostname explicitly helps the metrics in graphite, but then the job submission/execution does not work, which is even worse than not having the metrics. So my question is: Is there anything more which needs to be set when using the taskmanager.host config? Or perhaps I am doing something wrong with the setup? Regards ,Nikola Hrusov On Fri, Aug 14, 2020 at 6:26 AM Xintong Song <[hidden email]> wrote:
|
Hello, I am still trying to find how to properly setup a cluster with flink 1.11 and receive metrics on the hostnames. In my previous email I outlined I need to choose: a) receiving proper metrics or b) running my jobs. Ideally I should be able to do both as this is possible with flink 1.10 Can somebody shed some light on this matter? Regards ,Nikola Hrusov On Tue, Oct 27, 2020 at 9:35 PM Nikola Hrusov <[hidden email]> wrote:
|
Hey Nikola, sorry for the delayed response. I just tried the docker-compose files you've provided, and "docker-compose -f docker-compose.yml up" works for me -- metrics are shown in the UI, and I'm able to submit jobs via the web UI and the command line client. I got to work the "docker-compose_with_hostname.yml" after the following change: diff --git a/docker-compose_with_hostname.yml b/docker-compose_with_hostname.yml index d876cb5..c34073f 100644 --- a/docker-compose_with_hostname.yml +++ b/docker-compose_with_hostname.yml @@ -26,7 +26,7 @@ services: FLINK_PROPERTIES= jobmanager.rpc.address: jobmanager taskmanager.numberOfTaskSlots: 2 - taskmanager.host: "taskmanager-node01" + taskmanager.host: "taskmanager01" metrics.reporter.grph.factory.class: org.apache.flink.metrics.graphite.GraphiteReporterFactory metrics.reporter.grph.host: graphite metrics.reporter.grph.port: 2003 The name of the docker-compose service is the hostname. On Tue, Nov 3, 2020 at 5:24 PM Nikola Hrusov <[hidden email]> wrote:
|
I hope it's fine that I moved our discussion back on the list. You can not put an arbitrary hostname for the Flink configuration key "taskmanager.host". It must be a valid, resolvable hostname within the Flink cluster so that the RPC services can reach each other. I don't think there's a way to define a custom taskmanager name in the Flink metrics. I'm adding Chesnay to the conversation, since he's very familiar with the metrics system. On Wed, Nov 4, 2020 at 6:08 PM Nikola Hrusov <[hidden email]> wrote:
|
There is no convenient cosmetic way to
achieve what you want.
The only approach that would currently
work is hard-coding the host into the configuration of each
taskmanager via the metrics.scope.* configuration options.
On 11/4/2020 8:14 PM, Robert Metzger
wrote:
|
Hello, Thank you both for your input. I accidentally have pressed Reply instead of Reply all, thanks for bringing back the discussion to the userlist. As it is there are 2 ways to configure the hostname 1) using docker's hostname property under a service 2) using flink's explicit taskmanager.host configuration. Prior to 1.11 the taskmanager.host variable was not needed. The cluster does not seem to have taken it into consideration, because when I go to my jobmanager -> taskmanagers I could see the list of taskmanagers based on internal IPs. So the default internal IPs were being used, however the docker's hostname attribute was used for the metrics. The metrics format was flink.host.<host>.job.xxxx and the <host> was replaced with the value I have put in the docker hostname attribute. My only issue is that I couldn't find anything in the documentation regarding such a change and suddenly the metrics I have were not using the docker's hostname. I will try to use the metrics scope and pass the name of the hostname there instead. Regards ,Nikola Hrusov On Thu, Nov 5, 2020 at 1:31 AM Chesnay Schepler <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |