Posted by
Boris Lublinsky on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Jira-issue-Flink-11127-tp26180p26283.html
Adding metric-query port makes it a bit better, but there is still an error
019-02-22 00:03:56,173 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address <a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-22 00:04:16,213 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address <a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-22 00:04:36,253 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address <a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-22 00:04:56,293 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address <a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify”..
In the task manager and
2019-02-21 23:59:46,479 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 23:59:57,808 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-22 00:00:06,519 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-22 00:00:17,849 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-22 00:00:26,558 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-22 00:00:37,888 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/]] arriving at [<a href="akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123" class="">akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
I the job manager
Port 6123 is opened in both Job Manager deployment
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ template "fullname" . }}-jobmanager
spec:
replicas: 1
template:
metadata:
annotations:
labels:
server: flink
app: {{ template "fullname" . }}
component: jobmanager
spec:
containers:
- name: jobmanager
image: {{ .Values.image }}:{{ .Values.imageTag }}
imagePullPolicy: {{ .Values.imagePullPolicy }}
args:
- jobmanager
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 8081
name: ui
env:
- name: CONTAINER_METRIC_PORT
value: '{{ .Values.flink.metric_query_port }}'
- name: JOB_MANAGER_RPC_ADDRESS
value : {{ template "fullname" . }}-jobmanager
livenessProbe:
httpGet:
path: /overview
port: 8081
initialDelaySeconds: 30
periodSeconds: 10
resources:
limits:
cpu: {{ .Values.resources.jobmanager.limits.cpu }}
memory: {{ .Values.resources.jobmanager.limits.memory }}
requests:
cpu: {{ .Values.resources.jobmanager.requests.cpu }}
memory: {{ .Values.resources.jobmanager.requests.memory }}
And Job manager service
apiVersion: v1
kind: Service
metadata:
name: {{ template "fullname" . }}-jobmanager
spec:
ports:
- name: rpc
port: 6123
- name: blob
port: 6124
- name: ui
port: 8081
selector:
app: {{ template "fullname" . }}
component: jobmanager
Hi Boris,
the exact command depends on the docker-entrypoint.sh script and the image you are using. For the example contained in the Flink repository it is "task-manager", I think. The important thing is to pass "taskmanager.host" to the Taskmanager process. You can verify by checking the Taskmanager logs. These should contain lines like below:
2019-02-21 08:03:00,004 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Program Arguments:
2019-02-21 08:03:00,008 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - -Dtaskmanager.host=10.12.10.173
In the Jobmanager logs you should see that the Taskmanager is registered under the IP above in a line similar to:
2019-02-21 08:03:26,874 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda (akka.tcp://
flink@10.12.10.173:46531/user/taskmanager_0) at ResourceManager
A service per Taskmanager is not required. The purpose of the config parameter is that the Jobmanager addresses the taskmanagers by IP instead of hostname.
Hope this helps!
Cheers,
Konstantin
On Wed, Feb 20, 2019 at 4:37 PM Boris Lublinsky <
[hidden email]> wrote:
Also, The suggested workaround does not quite work.
I think the problem is that its trying to connect to flink-task-manager-1
Using busybody to experiment with nslookup, I can see
/ # nslookup flink-taskmanager-1.flink-taskmanager
Server: 10.0.11.151
Name: flink-taskmanager-1.flink-taskmanager
Address 1: 10.131.2.136 flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local
/ # nslookup flink-taskmanager-1
Server: 10.0.11.151
nslookup: can't resolve 'flink-taskmanager-1'
/ # nslookup flink-taskmanager-0.flink-taskmanager
Server: 10.0.11.151
Name: flink-taskmanager-0.flink-taskmanager
Address 1: 10.131.0.111 flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local
/ # nslookup flink-taskmanager-0
Server: 10.0.11.151
nslookup: can't resolve 'flink-taskmanager-0'
/ #
So the name should be postfixed with the service name. How do I force it? I suspect I am missing config parameter
Hi Boris,
the solution is actually simpler than it sounds from the ticket. The only thing you need to do is to set the "taskmanager.host" to the Pod's IP address in the Flink configuration. The easiest way to do this is to pass this config dynamically via a command-line parameter.
The Deployment spec could looks something like this:
containers:
- name: taskmanager
[...]
args:
- "taskmanager.sh"
- "start-foreground"
- "-Dtaskmanager.host=$(K8S_POD_IP)"
[...]
env:
- name: K8S_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
Hope this helps and let me know if this works.
Best,
Konstantin
On Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky <
[hidden email]> wrote:
Apparently there is a workaround for it.
Is it possible provide the complete helm chart for it.
Bits and pieces are in the ticket, but it would be nice to see the full chart
--
Konstantin Knauf | Solutions Architect
+49 160 91394525
Follow us @VervericaData
--
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
--
Konstantin Knauf | Solutions Architect
+49 160 91394525
Follow us @VervericaData
--
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen