|
I resolved this by changing the jobmanager-rest-service.yaml (Changed type to ClusterIP and removed nodePort
apiVersion: v1
kind: Service
metadata:
name: flink-jobmanager-rest
spec:
type: ClusterIP
ports:
- name: rest
port: 8081
targetPort: 8081
#nodePort: 30081
selector:
app: flink
component: jobmanager
It seems that you are using the NodePort to expose the rest service. If you only want to access the Flink UI/rest in the K8s cluster, then I would suggest to set "kubernetes.rest-service.exposed.type" to "ClusterIP". Because we are using the K8s master node to construct the JobManager rest endpoint when using NodePort. Sometime, it is not accessible due to firewall.
Best, Yang
Okay, it appears to have resolved 10.43.0.1:30081 as the address of the JobManager. Most likely, the container can not access this address. Can you validate this from within the container?
If I understand the Flink documentation correctly, you should be able to manually specify rest.address , rest.port for the JobManager address. If you can manually figure out an address to the JobManager service, and pass it to Flink, the submission should work.
Thanks for the reply. Here is an updated exception with DEBUG on. It appears to be timing out:
2021-05-05 16:56:19,700 DEBUG org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting namespace of Kubernetes client to cmdaa
2021-05-05 16:56:19,700 DEBUG org.apache.flink.kubernetes.kubeclient.FlinkKubeClientFactory [] - Setting max concurrent requests of Kubernetes client to 64
2021-05-05 16:56:20,176 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve flink cluster flink-jobmanager successfully, JobManager Web Interface: http://10.43.0.1:30081
2021-05-05 16:56:20,239 INFO org.apache.flink.client.cli.CliFrontend [] - Waiting for response...
2021-05-05 17:02:09,605 ERROR org.apache.flink.client.cli.CliFrontend [] - Error while running the command.
org.apache.flink.util.FlinkException: Failed to retrieve job list.
at org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:449) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:430) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) [flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) [flink-dist_2.12-1.13.0.jar:1.13.0]
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:386) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_292]
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: /10.43.0.1:30081
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_292]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_292]
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:430) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
Caused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: /10.43.0.1:30081
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
ngleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-dist_2.12-1.13.0.jar:1.13.0]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
Hi, can you check the client log in the "log/" directory? The Flink client will try to access the K8s API server to retrieve the endpoint of the jobmanager. For that, the pod needs to have permissions (through a service account) to make such calls to K8s. My hope is that the logs or previous messages are giving an indication into what Flink is trying to do. Can you also try running on DEBUG log level? (should be the log4j-cli.properties file).
I have a flink cluster running in kubernetes, just the basic installation with one JobManager and two TaskManagers. I want to interact with it via command line from a separate container ie:
root@flink-client:/opt/flink# ./bin/flink list --target kubernetes-application -Dkubernetes.cluster-id=job-manager
How do you interact in the same kubernetes instance via CLI (Not from the desktop)? This is the exception:
------------------------------------------------------------
The program finished with the following exception:
java.lang.RuntimeException: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of job-manager
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:103)
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:145)
at org.apache.flink.kubernetes.KubernetesClusterDescriptor.retrieve(KubernetesClusterDescriptor.java:67)
at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1001)
at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:427)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1060)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of job-manager
... 9 more
root@flink-client:/opt/flink#
-- Robert Cullen 240-475-4490
--
Robert Cullen 240-475-4490
-- Robert Cullen 240-475-4490
|