I resolved this by changing the jobmanager-rest-service.yaml (Changed type to ClusterIP and removed nodePort
apiVersion: v1
kind: Service
metadata:
name: flink-jobmanager-rest
spec:
type: ClusterIP
ports:
- name: rest
port: 8081
targetPort: 8081
#nodePort: 30081
selector:
app: flink
component: jobmanager
It seems that you are using the NodePort to expose the rest service. If you only want to access the Flink UI/rest in the K8s cluster,then I would suggest to set "kubernetes.rest-service.exposed.type" to "ClusterIP". Because we are using the K8s master node toconstruct the JobManager rest endpoint when using NodePort. Sometime, it is not accessible due to firewall.Best,YangRobert Metzger <[hidden email]> 于2021年5月6日周四 上午2:08写道:Okay, it appears to have resolved 10.43.0.1:30081 as the address of the JobManager. Most likely, the container can not access this address. Can you validate this from within the container?If I understand the Flink documentation correctly, you should be able to manually specifyrest.address
,rest.port
for the JobManager address. If you can manually figure out an address to the JobManager service, and pass it to Flink, the submission should work.On Wed, May 5, 2021 at 7:15 PM Robert Cullen <[hidden email]> wrote:Thanks for the reply. Here is an updated exception with DEBUG on. It appears to be timing out:
2021-05-05 16:56:19,700 DEBUG org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting namespace of Kubernetes client to cmdaa 2021-05-05 16:56:19,700 DEBUG org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting max concurrent requests of Kubernetes client to 64 2021-05-05 16:56:20,176 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve flink cluster flink-jobmanager successfully, JobManager Web Interface: http://10.43.0.1:30081 2021-05-05 16:56:20,239 INFO org.apache.flink.client.cli.CliFrontend [] - Waiting for response... 2021-05-05 17:02:09,605 ERROR org.apache.flink.client.cli.CliFrontend [] - Error while running the command. org.apache.flink.util.FlinkException: Failed to retrieve job list. at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) [flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) [flink-dist_2.12-1.13.0.jar:1.13.0] Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted. at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_292] at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: /10.43.0.1:30081 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_292] at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] Caused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: /10.43.0.1:30081 at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] ngleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
On Wed, May 5, 2021 at 6:59 AM Robert Metzger <[hidden email]> wrote:Hi,can you check the client log in the "log/" directory?The Flink client will try to access the K8s API server to retrieve the endpoint of the jobmanager. For that, the pod needs to have permissions (through a service account) to make such calls to K8s. My hope is that the logs or previous messages are giving an indication into what Flink is trying to do.Can you also try running on DEBUG log level? (should be the log4j-cli.properties file).On Tue, May 4, 2021 at 3:17 PM Robert Cullen <[hidden email]> wrote:I have a flink cluster running in kubernetes, just the basic installation with one JobManager and two TaskManagers. I want to interact with it via command line from a separate container ie:
root@flink-client:/opt/flink# ./bin/flink list --target kubernetes-application -Dkubernetes.cluster-id=job-manager
How do you interact in the same kubernetes instance via CLI (Not from the desktop)? This is the exception:
------------------------------------------------------------ The program finished with the following exception: java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of job-manager at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:103) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:145) at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:67) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1001) at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of job-manager ... 9 more root@flink-client:/opt/flink#
--Robert Cullen
240-475-4490
--Robert Cullen
240-475-4490
Free forum by Nabble | Edit this page |