Adding metric-query port makes it a bit better, but there is still an error019-02-22 00:03:56,173 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..2019-02-22 00:04:16,213 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..2019-02-22 00:04:36,253 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..2019-02-22 00:04:56,293 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify”..In the task manager and2019-02-21 23:59:46,479 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]2019-02-21 23:59:57,808 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]2019-02-22 00:00:06,519 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]2019-02-22 00:00:17,849 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]2019-02-22 00:00:26,558 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]2019-02-22 00:00:37,888 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]I the job managerPort 6123 is opened in both Job Manager deploymentapiVersion: extensions/v1beta1kind: Deploymentmetadata:name: {{ template "fullname" . }}-jobmanagerspec:replicas: 1template:metadata:annotations:prometheus.io/scrape: 'true'prometheus.io/port: '9249'labels:server: flinkapp: {{ template "fullname" . }}component: jobmanagerspec:containers:- name: jobmanagerimage: {{ .Values.image }}:{{ .Values.imageTag }}imagePullPolicy: {{ .Values.imagePullPolicy }}args:- jobmanagerports:- containerPort: 6123name: rpc- containerPort: 6124name: blob- containerPort: 8081name: uienv:- name: CONTAINER_METRIC_PORTvalue: '{{ .Values.flink.metric_query_port }}'- name: JOB_MANAGER_RPC_ADDRESSvalue : {{ template "fullname" . }}-jobmanagerlivenessProbe:httpGet:path: /overviewport: 8081initialDelaySeconds: 30periodSeconds: 10resources:limits:cpu: {{ .Values.resources.jobmanager.limits.cpu }}memory: {{ .Values.resources.jobmanager.limits.memory }}requests:cpu: {{ .Values.resources.jobmanager.requests.cpu }}memory: {{ .Values.resources.jobmanager.requests.memory }}And Job manager serviceapiVersion: v1kind: Servicemetadata:name: {{ template "fullname" . }}-jobmanagerspec:ports:- name: rpcport: 6123- name: blobport: 6124- name: uiport: 8081selector:app: {{ template "fullname" . }}component: jobmanagerOn Feb 21, 2019, at 6:13 PM, Boris Lublinsky <[hidden email]> wrote:On Feb 21, 2019, at 2:05 AM, Konstantin Knauf <[hidden email]> wrote:Hi Boris,the exact command depends on the docker-entrypoint.sh script and the image you are using. For the example contained in the Flink repository it is "task-manager", I think. The important thing is to pass "taskmanager.host" to the Taskmanager process. You can verify by checking the Taskmanager logs. These should contain lines like below:2019-02-21 08:03:00,004 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Program Arguments:
2019-02-21 08:03:00,008 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - -Dtaskmanager.host=10.12.10.173In the Jobmanager logs you should see that the Taskmanager is registered under the IP above in a line similar to:2019-02-21 08:03:26,874 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda (akka.tcp://flink@10.12.10.173:46531/user/taskmanager_0) at ResourceManagerA service per Taskmanager is not required. The purpose of the config parameter is that the Jobmanager addresses the taskmanagers by IP instead of hostname.Hope this helps!Cheers,KonstantinOn Wed, Feb 20, 2019 at 4:37 PM Boris Lublinsky <[hidden email]> wrote:Also, The suggested workaround does not quite work.
2019-02-20 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink-metrics@flink-taskmanager-1:6170] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@flink-taskmanager-1:6170]] Caused by: [flink-taskmanager-1: No address associated with hostname] 2019-02-20 15:27:48,750 ERROR org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler - Caught exception I think the problem is that its trying to connect to flink-task-manager-1Using busybody to experiment with nslookup, I can see/ # nslookup flink-taskmanager-1.flink-taskmanagerServer: 10.0.11.151Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internalName: flink-taskmanager-1.flink-taskmanagerAddress 1: 10.131.2.136 flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local/ # nslookup flink-taskmanager-1Server: 10.0.11.151Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internalnslookup: can't resolve 'flink-taskmanager-1'/ # nslookup flink-taskmanager-0.flink-taskmanagerServer: 10.0.11.151Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internalName: flink-taskmanager-0.flink-taskmanagerAddress 1: 10.131.0.111 flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local/ # nslookup flink-taskmanager-0Server: 10.0.11.151Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internalnslookup: can't resolve 'flink-taskmanager-0'/ #So the name should be postfixed with the service name. How do I force it? I suspect I am missing config parameterOn Feb 19, 2019, at 4:33 AM, Konstantin Knauf <[hidden email]> wrote:Hi Boris,the solution is actually simpler than it sounds from the ticket. The only thing you need to do is to set the "taskmanager.host" to the Pod's IP address in the Flink configuration. The easiest way to do this is to pass this config dynamically via a command-line parameter.The Deployment spec could looks something like this:containers:
- name: taskmanager
[...]
args:
- "taskmanager.sh"
- "start-foreground"
- "-Dtaskmanager.host=$(K8S_POD_IP)"
[...]env:
- name: K8S_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIPHope this helps and let me know if this works.Best,KonstantinOn Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky <[hidden email]> wrote:I was looking at this issue https://issues.apache.org/jira/browse/FLINK-11127Apparently there is a workaround for it.Is it possible provide the complete helm chart for it.Bits and pieces are in the ticket, but it would be nice to see the full chart
--Konstantin Knauf | Solutions Architect+49 160 91394525Follow us @VervericaData--Stream Processing | Event Driven | Real Time--Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany--Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
--Konstantin Knauf | Solutions Architect+49 160 91394525Follow us @VervericaData--Stream Processing | Event Driven | Real Time--Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany--Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
Free forum by Nabble | Edit this page |