| Hi,I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html; kubectl create -f flink-configuration-configmap.yaml kubectl create -f jobmanager-service.yaml kubectl create -f jobmanager-session-deployment.yaml kubectl create -f taskmanager-session-deployment.yaml But I got this 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [<a href="akka.tcp://flink@flink-jobmanager:6123" class="">akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [<a href="akka.tcp://flink@flink-jobmanager:6123" class="">akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution] 2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. 2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address <a href="akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*" class="">akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms].  And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager I am new to k8s, can anyone give me some tutorial? Thanks a lot ! | 
 
	
					
		
	
					| Hi art, could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using. Cheers, Till On Wed, Sep 2, 2020 at 9:50 AM art <[hidden email]> wrote: 
 | 
 
	
					
		
	
					| Hi art, could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running. Cheers, Till On Wed, Sep 2, 2020 at 11:15 AM art <[hidden email]> wrote: 
 | 
| 
		Hi Till, The full information when I run command ' kubectl get all’  like this: NAME                                     READY   STATUS    RESTARTS   AGE pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE service/flink-jobmanager   ClusterIP   10.103.207.75   <none>        6123/TCP,6124/TCP,8081/TCP   2m34s service/kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP                      5d2h NAME                                READY   UP-TO-DATE   AVAILABLE   AGE deployment.apps/flink-jobmanager    1/1     1            1           2m34s deployment.apps/flink-taskmanager   1/1     1            1           2m34s NAME                                           DESIRED   CURRENT   READY   AGE replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1       2m34s replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1       2m34s And I can open flink ui but the task manger is 0 ,so the job manger is work well I think the problem is taksmanger can not register itself to jobmanger,  did I miss some configure? 
 | 
 
	
					
		
	
					| Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot. Cheers, Till On Wed, Sep 2, 2020 at 12:45 PM art <[hidden email]> wrote: 
 | 
| 
        Hi Till, This is the taskManager log As you see, the logs print  ‘line 92 -- Could not connect to flink-jobmanager:6123’ then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And repeat print this A few minutes later, the taskmanger shut down and restart This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot! --------------------------------------------------- flink-configuration-configmap.yaml apiVersion: v1 kind: ConfigMap metadata:   name: flink-config   labels:     app: flink data:   flink-conf.yaml: |+     jobmanager.rpc.address: flink-jobmanager     taskmanager.numberOfTaskSlots: 1     blob.server.port: 6124     jobmanager.rpc.port: 6123     taskmanager.rpc.port: 6122     queryable-state.proxy.ports: 6125     jobmanager.memory.process.size: 1024m     taskmanager.memory.process.size: 1024m     parallelism.default: 1   log4j-console.properties: |+     rootLogger.level = INFO     rootLogger.appenderRef.console.ref = ConsoleAppender     rootLogger.appenderRef.rolling.ref = RollingFileAppender     logger.akka.name = akka     logger.akka.level = INFO     logger.kafka.name= org.apache.kafka     logger.kafka.level = INFO     logger.hadoop.name = org.apache.hadoop     logger.hadoop.level = INFO     logger.zookeeper.name = org.apache.zookeeper     logger.zookeeper.level = INFO     appender.console.name = ConsoleAppender     appender.console.type = CONSOLE     appender.console.layout.type = PatternLayout     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n     appender.rolling.name = RollingFileAppender     appender.rolling.type = RollingFile     appender.rolling.append = false     appender.rolling.fileName = ${sys:log.file}     appender.rolling.filePattern = ${sys:log.file}.%i     appender.rolling.layout.type = PatternLayout     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n     appender.rolling.policies.type = Policies     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy     appender.rolling.policies.size.size=100MB     appender.rolling.strategy.type = DefaultRolloverStrategy     appender.rolling.strategy.max = 10     logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline     logger.netty.level = OFF --------------------------------------------------- jobmanager-service.yaml apiVersion: v1 kind: Service metadata:   name: flink-jobmanager spec:   type: ClusterIP   ports:   - name: rpc     port: 6123   - name: blob-server     port: 6124   - name: webui     port: 8081   selector:     app: flink     component: jobmanager -------------------------------------------------- jobmanager-session-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata:   name: flink-jobmanager spec:   replicas: 1   selector:     matchLabels:       app: flink       component: jobmanager   template:     metadata:       labels:         app: flink         component: jobmanager     spec:       containers:       - name: jobmanager         image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1         args: ["jobmanager"]         ports:         - containerPort: 6123           name: rpc         - containerPort: 6124           name: blob-server         - containerPort: 8081           name: webui         livenessProbe:           tcpSocket:             port: 6123           initialDelaySeconds: 30           periodSeconds: 60         volumeMounts:         - name: flink-config-volume           mountPath: /opt/flink/conf         securityContext:           runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary       volumes:       - name: flink-config-volume         configMap:           name: flink-config           items:           - key: flink-conf.yaml             path: flink-conf.yaml           - key: log4j-console.properties             path: log4j-console.properties       imagePullSecrets:         - name: regcred --------------------------------------------------- taskmanager-session-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata:   name: flink-taskmanager spec:   replicas: 1   selector:     matchLabels:       app: flink       component: taskmanager   template:     metadata:       labels:         app: flink         component: taskmanager     spec:       containers:       - name: taskmanager         image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1         args: ["taskmanager"]         ports:         - containerPort: 6122           name: rpc         - containerPort: 6125           name: query-state         livenessProbe:           tcpSocket:             port: 6122           initialDelaySeconds: 30           periodSeconds: 60         volumeMounts:         - name: flink-config-volume           mountPath: /opt/flink/conf/         securityContext:           runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary       volumes:       - name: flink-config-volume         configMap:           name: flink-config           items:           - key: flink-conf.yaml             path: flink-conf.yaml           - key: log4j-console.properties             path: log4j-console.properties       imagePullSecrets:         - name: regcred On 09/2/2020 20:38,[hidden email] wrote:  
 | 
| 
        Hi Till, I find something may be helpful. The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip 172.18.0.6 When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn -- /bin/bash’ && ‘ping 172.18.0.5’  I can get response But when I ping flink-jobmanager ,there is no response On 09/3/2020 09:03,[hidden email] wrote:  
 | 
| I guess something is wrong with your kube proxy, which causes TaskManager could not connect to JobManager. You could verify this by directly using JobManager Pod ip instead of service name. Please do as follows. * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and update the args field to the following.    args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given that "172.18.0.5" is the JobManager pod ip. * Delete the current TaskManager pod and let restart again * Now check the TaskManager logs to check whether it could register successfully Best, Yang superainbower <[hidden email]> 于2020年9月3日周四 上午9:35写道: 
 | 
| HI Yang, I update taskmanager-session-deployment.yaml like this: apiVersion: apps/v1 kind: Deployment metadata:   name: flink-taskmanager spec:   replicas: 1   selector:     matchLabels:       app: flink       component: taskmanager   template:     metadata:       labels:         app: flink         component: taskmanager     spec:       containers:       - name: taskmanager         image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1         args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]         ports:         - containerPort: 6122           name: rpc         - containerPort: 6125           name: query-state         livenessProbe:           tcpSocket:             port: 6122           initialDelaySeconds: 30           periodSeconds: 60         volumeMounts:         - name: flink-config-volume           mountPath: /opt/flink/conf/         securityContext:           runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary       volumes:       - name: flink-config-volume         configMap:           name: flink-config           items:           - key: flink-conf.yaml             path: flink-conf.yaml           - key: log4j-console.properties             path: log4j-console.properties       imagePullSecrets:         - name: regcred And Delete the TaskManager pod and restart it , but the logs print this Could not resolve ResourceManager address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*
 It change flink-jobmanager to 172.18.0.5  On 09/3/2020 11:09,[hidden email] wrote:  
 | 
| Sorry i forget that the JobManager is binding its rpc address to flink-jobmanager, not the ip address. So you need to also update the jobmanager-session-deployment.yaml with following changes. ...       containers: - name: jobmanager env: - name: JM_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP image: flink:1.11 args: ["jobmanager", "$(JM_IP)"] ... After then the JobManager is binding the rpc address with its ip. Best, Yang superainbower <[hidden email]> 于2020年9月3日周四 上午11:38写道: 
 | 
 
	
					
		
	
					| In order to exclude a Minikube problem, you could also try to run Flink on an older Minikube and an older K8s version. Our end-to-end tests use Minikube v1.8.2, for example. Cheers, Till On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <[hidden email]> wrote: 
 | 
| 
        Hi Till & Yang, I can deploy Flink on kubernetes(not minikube), it works well So there are some problem about my minikube but I can’t find and fix it Anyway I can deploy on k8s now Thanks for your help! On 09/3/2020 15:47,[hidden email] wrote:  
 | 
 
	
					
		
	
					| Great to hear that it works on K8s and letting us know that the problem is likely to be caused by Minikube. Cheers, Till On Fri, Sep 4, 2020 at 8:53 AM superainbower <[hidden email]> wrote: 
 | 
| Free forum by Nabble | Edit this page | 
 
	

 
	
	
		
