Hi,I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html; kubectl create -f flink-configuration-configmap.yaml kubectl create -f jobmanager-service.yaml kubectl create -f jobmanager-session-deployment.yaml kubectl create -f taskmanager-session-deployment.yaml But I got this 2020-09-02 06:45:42,664 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [<a href="akka.tcp://flink@flink-jobmanager:6123" class="">akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [<a href="akka.tcp://flink@flink-jobmanager:6123" class="">akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution] 2020-09-02 06:45:42,691 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. 2020-09-02 06:46:02,731 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. 2020-09-02 06:46:12,731 INFO akka.remote.transport.ProtocolStateActor [] - No response from remote for outbound association. Associate timed out after [20000 ms]. And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager I am new to k8s, can anyone give me some tutorial? Thanks a lot ! |
Hi art, could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using. Cheers, Till On Wed, Sep 2, 2020 at 9:50 AM art <[hidden email]> wrote:
|
Hi art, could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running. Cheers, Till On Wed, Sep 2, 2020 at 11:15 AM art <[hidden email]> wrote:
|
Hi Till,
The full information when I run command ' kubectl get all’ like this: NAME READY STATUS RESTARTS AGE pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 2m34s pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 2m34s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/flink-jobmanager ClusterIP 10.103.207.75 <none> 6123/TCP,6124/TCP,8081/TCP 2m34s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d2h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/flink-jobmanager 1/1 1 1 2m34s deployment.apps/flink-taskmanager 1/1 1 1 2m34s NAME DESIRED CURRENT READY AGE replicaset.apps/flink-jobmanager-85bdbd98d8 1 1 1 2m34s replicaset.apps/flink-taskmanager-74c68c6f48 1 1 1 2m34s And I can open flink ui but the task manger is 0 ,so the job manger is work well I think the problem is taksmanger can not register itself to jobmanger, did I miss some configure?
|
Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot. Cheers, Till On Wed, Sep 2, 2020 at 12:45 PM art <[hidden email]> wrote:
|
Hi Till, This is the taskManager log As you see, the logs print ‘line 92 -- Could not connect to flink-jobmanager:6123’ then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’ And repeat print this A few minutes later, the taskmanger shut down and restart This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot! --------------------------------------------------- flink-configuration-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: flink-config labels: app: flink data: flink-conf.yaml: |+ jobmanager.rpc.address: flink-jobmanager taskmanager.numberOfTaskSlots: 1 blob.server.port: 6124 jobmanager.rpc.port: 6123 taskmanager.rpc.port: 6122 queryable-state.proxy.ports: 6125 jobmanager.memory.process.size: 1024m taskmanager.memory.process.size: 1024m parallelism.default: 1 log4j-console.properties: |+ rootLogger.level = INFO rootLogger.appenderRef.console.ref = ConsoleAppender rootLogger.appenderRef.rolling.ref = RollingFileAppender logger.akka.name = akka logger.akka.level = INFO logger.kafka.name= org.apache.kafka logger.kafka.level = INFO logger.hadoop.name = org.apache.hadoop logger.hadoop.level = INFO logger.zookeeper.name = org.apache.zookeeper logger.zookeeper.level = INFO appender.console.name = ConsoleAppender appender.console.type = CONSOLE appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n appender.rolling.name = RollingFileAppender appender.rolling.type = RollingFile appender.rolling.append = false appender.rolling.fileName = ${sys:log.file} appender.rolling.filePattern = ${sys:log.file}.%i appender.rolling.layout.type = PatternLayout appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n appender.rolling.policies.type = Policies appender.rolling.policies.size.type = SizeBasedTriggeringPolicy appender.rolling.policies.size.size=100MB appender.rolling.strategy.type = DefaultRolloverStrategy appender.rolling.strategy.max = 10 logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline logger.netty.level = OFF --------------------------------------------------- jobmanager-service.yaml apiVersion: v1 kind: Service metadata: name: flink-jobmanager spec: type: ClusterIP ports: - name: rpc port: 6123 - name: blob-server port: 6124 - name: webui port: 8081 selector: app: flink component: jobmanager -------------------------------------------------- jobmanager-session-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-jobmanager spec: replicas: 1 selector: matchLabels: app: flink component: jobmanager template: metadata: labels: app: flink component: jobmanager spec: containers: - name: jobmanager image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 args: ["jobmanager"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob-server - containerPort: 8081 name: webui livenessProbe: tcpSocket: port: 6123 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties imagePullSecrets: - name: regcred --------------------------------------------------- taskmanager-session-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-taskmanager spec: replicas: 1 selector: matchLabels: app: flink component: taskmanager template: metadata: labels: app: flink component: taskmanager spec: containers: - name: taskmanager image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 args: ["taskmanager"] ports: - containerPort: 6122 name: rpc - containerPort: 6125 name: query-state livenessProbe: tcpSocket: port: 6122 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties imagePullSecrets: - name: regcred On 09/2/2020 20:38,[hidden email] wrote:
|
Hi Till, I find something may be helpful. The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip 172.18.0.6 When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn -- /bin/bash’ && ‘ping 172.18.0.5’ I can get response But when I ping flink-jobmanager ,there is no response
On 09/3/2020 09:03,[hidden email] wrote:
|
I guess something is wrong with your kube proxy, which causes TaskManager could not connect to JobManager. You could verify this by directly using JobManager Pod ip instead of service name. Please do as follows. * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and update the args field to the following. args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"] Given that "172.18.0.5" is the JobManager pod ip. * Delete the current TaskManager pod and let restart again * Now check the TaskManager logs to check whether it could register successfully Best, Yang superainbower <[hidden email]> 于2020年9月3日周四 上午9:35写道:
|
HI Yang, I update taskmanager-session-deployment.yaml like this: apiVersion: apps/v1 kind: Deployment metadata: name: flink-taskmanager spec: replicas: 1 selector: matchLabels: app: flink component: taskmanager template: metadata: labels: app: flink component: taskmanager spec: containers: - name: taskmanager image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"] ports: - containerPort: 6122 name: rpc - containerPort: 6125 name: query-state livenessProbe: tcpSocket: port: 6122 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties imagePullSecrets: - name: regcred And Delete the TaskManager pod and restart it , but the logs print this Could not resolve ResourceManager address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*
It change flink-jobmanager to 172.18.0.5
On 09/3/2020 11:09,[hidden email] wrote:
|
Sorry i forget that the JobManager is binding its rpc address to flink-jobmanager, not the ip address. So you need to also update the jobmanager-session-deployment.yaml with following changes. ... containers: - name: jobmanager env: - name: JM_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP image: flink:1.11 args: ["jobmanager", "$(JM_IP)"] ... After then the JobManager is binding the rpc address with its ip. Best, Yang superainbower <[hidden email]> 于2020年9月3日周四 上午11:38写道:
|
In order to exclude a Minikube problem, you could also try to run Flink on an older Minikube and an older K8s version. Our end-to-end tests use Minikube v1.8.2, for example. Cheers, Till On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <[hidden email]> wrote:
|
Hi Till & Yang, I can deploy Flink on kubernetes(not minikube), it works well So there are some problem about my minikube but I can’t find and fix it Anyway I can deploy on k8s now Thanks for your help! On 09/3/2020 15:47,[hidden email] wrote:
|
Great to hear that it works on K8s and letting us know that the problem is likely to be caused by Minikube. Cheers, Till On Fri, Sep 4, 2020 at 8:53 AM superainbower <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |