Hi, I'm trying to use Flink on native kubernetes (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/) but I have an error even with the example from the documentation. The job get submitted but stays in "created" status until it timeouts after 5 minutes. In the log of the task manager, I can see that the error is "Could not resolve ResourceManager address" What can be the issue ? Here are the logs : > ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=franz-01 2021-05-10 16:05:00,392 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.address, localhost 2021-05-10 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.port, 6123 2021-05-10 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.memory.process.size, 1600m 2021-05-10 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.memory.process.size, 1728m 2021-05-10 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-05-10 16:05:00,395 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: parallelism.default, 1 2021-05-10 16:05:00,396 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-05-10 16:05:00,432 INFO org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - Could not load factory due to missing dependencies. 2021-05-10 16:05:02,680 INFO org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead 2021-05-10 16:05:02,690 INFO org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead 2021-05-10 16:05:02,699 INFO org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124 2021-05-10 16:05:02,700 INFO org.apache.flink.kubernetes.utils.KubernetesUtils [] - Kubernetes deployment requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122 2021-05-10 16:05:02,760 INFO org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead 2021-05-10 16:05:05,440 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster franz-01 successfully, JobManager Web Interface: http://xxx:8081 Task Manager logs 2021-05-10 14:09:05,463 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,464 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.framework.off-heap.size=134217728b 2021-05-10 14:09:05,464 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,464 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.network.max=134217730b 2021-05-10 14:09:05,464 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,464 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.network.min=134217730b 2021-05-10 14:09:05,464 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.framework.heap.size=134217728b 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.managed.size=536870920b 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.cpu.cores=1.0 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,465 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.task.heap.size=402653174b 2021-05-10 14:09:05,466 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,466 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.task.off-heap.size=0b 2021-05-10 14:09:05,466 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,467 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.jvm-metaspace.size=268435456b 2021-05-10 14:09:05,467 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,467 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.jvm-overhead.max=201326592b 2021-05-10 14:09:05,470 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -D 2021-05-10 14:09:05,470 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - taskmanager.memory.jvm-overhead.min=201326592b 2021-05-10 14:09:05,470 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - --configDir 2021-05-10 14:09:05,470 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - /opt/flink/conf 2021-05-10 14:09:05,470 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Dtaskmanager.resource-id=franz-01-taskmanager-1-1 2021-05-10 14:09:05,471 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Djobmanager.memory.off-heap.size=134217728b 2021-05-10 14:09:05,471 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Djobmanager.memory.jvm-overhead.min=201326592b 2021-05-10 14:09:05,472 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Dweb.tmpdir=/tmp/flink-web-e60a7b21-4e2b-4b6c-a0ac-5b08816edcee 2021-05-10 14:09:05,472 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Djobmanager.memory.jvm-metaspace.size=268435456b 2021-05-10 14:09:05,472 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Djobmanager.memory.heap.size=1073741824b 2021-05-10 14:09:05,472 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -Djobmanager.memory.jvm-overhead.max=201326592b 2021-05-10 14:09:05,472 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - Classpath: /opt/flink/lib/flink-csv-1.12.2.jar:/opt/flink/lib/flink-json-1.12.2.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.12.2.jar:/opt/flink/lib/flink-table_2.12-1.12.2.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.12.2.jar::: 2021-05-10 14:09:05,472 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - -------------------------------------------------------------------------------- 2021-05-10 14:09:05,475 INFO org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - Registered UNIX signal handlers for [TERM, HUP, INT] 2021-05-10 14:09:05,510 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: blob.server.port, 6124 2021-05-10 14:09:05,511 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.memory.process.size, 1728m 2021-05-10 14:09:05,511 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint 2021-05-10 14:09:05,513 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.execution.failover-strategy, region 2021-05-10 14:09:05,514 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.address, franz-01.default 2021-05-10 14:09:05,514 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: execution.target, kubernetes-session 2021-05-10 14:09:05,515 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.memory.process.size, 1600m 2021-05-10 14:09:05,516 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.port, 6123 2021-05-10 14:09:05,516 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.cluster-id, franz-01 2021-05-10 14:09:05,516 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.rpc.port, 6122 2021-05-10 14:09:05,517 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: internal.cluster.execution-mode, NORMAL 2021-05-10 14:09:05,517 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: parallelism.default, 1 2021-05-10 14:09:05,519 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2021-05-10 14:09:05,658 INFO org.apache.flink.core.fs.FileSystem [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. 2021-05-10 14:09:05,733 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2021-05-10 14:09:05,738 INFO org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file will be created as /tmp/jaas-3361029581556571704.conf. 2021-05-10 14:09:05,744 INFO org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. 2021-05-10 14:09:05,811 INFO org.apache.flink.configuration.Configuration [] - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address' 2021-05-10 14:09:05,855 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - Trying to select the network interface and address to use by connecting to the leading JobManager. 2021-05-10 14:09:05,855 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager will try to connect for PT10S before falling back to heuristics 2021-05-10 14:09:26,116 WARN org.apache.flink.runtime.net.ConnectionUtils [] - Could not connect to franz-01.default:6123. Selecting a local address using heuristics. 2021-05-10 14:09:26,116 WARN org.apache.flink.runtime.net.ConnectionUtils [] - Could not find any IPv4 address that is not loopback or link-local. Using localhost address. 2021-05-10 14:09:26,117 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - TaskManager will use hostname/address 'franz-01-taskmanager-1-1' (10.2.2.37) for communication. 2021-05-10 14:09:26,136 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address 10.2.2.37:6122, bind address 0.0.0.0:6122. 2021-05-10 14:09:27,212 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started 2021-05-10 14:09:27,283 INFO akka.remote.Remoting [] - Starting remoting 2021-05-10 14:09:27,586 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink@10.2.2.37:6122] 2021-05-10 14:09:27,730 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink@10.2.2.37:6122 2021-05-10 14:09:27,781 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl [] - No metrics reporter configured, no metrics will be exposed/reported. 2021-05-10 14:09:27,786 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address 10.2.2.37:0, bind address 0.0.0.0:0. 2021-05-10 14:09:27,814 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started 2021-05-10 14:09:27,819 INFO akka.remote.Remoting [] - Starting remoting 2021-05-10 14:09:27,881 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.2.2.37:39177] 2021-05-10 14:09:27,895 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink-metrics@10.2.2.37:39177 2021-05-10 14:09:27,916 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService_franz-01-taskmanager-1-1 . 2021-05-10 14:09:27,931 INFO org.apache.flink.runtime.blob.PermanentBlobCache [] - Created BLOB cache storage directory /tmp/blobStore-16255e13-c39a-442f-853a-cd1e331e7325 2021-05-10 14:09:27,934 INFO org.apache.flink.runtime.blob.TransientBlobCache [] - Created BLOB cache storage directory /tmp/blobStore-5ac02374-808a-4529-b80c-088dbeac2711 2021-05-10 14:09:27,955 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: [] 2021-05-10 14:09:27,955 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: [] 2021-05-10 14:09:27,955 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Starting TaskManager with ResourceID: franz-01-taskmanager-1-1 2021-05-10 14:09:27,990 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices [] - Temporary file directory '/tmp': total 48 GB, usable 38 GB (79.17% usable) 2021-05-10 14:09:27,997 INFO org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - FileChannelManager uses directory /tmp/flink-io-c08780a7-90bd-4259-8f51-8a24d95c21df for spill files. 2021-05-10 14:09:28,059 INFO org.apache.flink.runtime.io.network.netty.NettyConfig [] - NettyConfig [server address: /0.0.0.0, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: AUTO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] 2021-05-10 14:09:28,063 INFO org.apache.flink.runtime.io.disk.FileChannelManagerImpl [] - FileChannelManager uses directory /tmp/flink-netty-shuffle-209ae6cc-6fd5-4c9e-b6df-acc675a6995c for spill files. 2021-05-10 14:09:28,578 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool [] - Allocated 128 MB for network buffer pool (number of memory segments: 4096, bytes per segment: 32768). 2021-05-10 14:09:28,594 INFO org.apache.flink.runtime.io.network.NettyShuffleEnvironment [] - Starting the network environment and its components. 2021-05-10 14:09:28,789 INFO org.apache.flink.runtime.io.network.netty.NettyClient [] - Transport type 'auto': using EPOLL. 2021-05-10 14:09:28,791 INFO org.apache.flink.runtime.io.network.netty.NettyClient [] - Successful initialization (took 196 ms). 2021-05-10 14:09:28,796 INFO org.apache.flink.runtime.io.network.netty.NettyServer [] - Transport type 'auto': using EPOLL. 2021-05-10 14:09:28,892 INFO org.apache.flink.runtime.io.network.netty.NettyServer [] - Successful initialization (took 99 ms). Listening on SocketAddress /0:0:0:0:0:0:0:0%0:40399. 2021-05-10 14:09:28,894 INFO org.apache.flink.runtime.taskexecutor.KvStateService [] - Starting the kvState service and its components. 2021-05-10 14:09:28,979 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 . 2021-05-10 14:09:29,002 INFO org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start job leader service. 2021-05-10 14:09:29,005 INFO org.apache.flink.runtime.filecache.FileCache [] - User file cache uses directory /tmp/flink-dist-cache-bc340200-15c9-4d0a-950a-f43469bdb58d 2021-05-10 14:09:29,055 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000). 2021-05-10 14:09:29,276 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default] 2021-05-10 14:09:29,289 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. 2021-05-10 14:09:49,314 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. 2021-05-10 14:09:59,325 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution] 2021-05-10 14:09:59,327 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. 2021-05-10 14:10:19,365 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. 2021-05-10 14:10:29,363 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution] 2021-05-10 14:10:29,385 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. 2021-05-10 14:10:49,425 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. 2021-05-10 14:10:59,423 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution] 2021-05-10 14:10:59,445 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*. Job Manager logs 2021-05-10 14:09:00,393 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received JobGraph submission a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job). 2021-05-10 14:09:00,395 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Submitting job a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job). 2021-05-10 14:09:00,524 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 . 2021-05-10 14:09:00,537 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 14:09:00,612 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 14:09:00,665 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 14:09:00,666 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms. 2021-05-10 14:09:00,707 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 15 ms 2021-05-10 14:09:00,742 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880) 2021-05-10 14:09:00,823 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No checkpoint found during restore. 2021-05-10 14:09:00,830 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@43519311 for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe). 2021-05-10 14:09:00,844 INFO org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - JobManager runner for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://[hidden email]:6123/user/rpc/jobmanager_2. 2021-05-10 14:09:00,848 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting execution of job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) under job master id 00000000000000000000000000000000. 2021-05-10 14:09:00,851 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy] 2021-05-10 14:09:00,852 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) switched from state CREATED to RUNNING. 2021-05-10 14:09:00,912 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom Source -> Filter -> Timestamps/Watermarks (1/1) (b10791bc97d1d772bd443abd92bf32c0) switched from CREATED to SCHEDULED. 2021-05-10 14:09:00,913 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Window(ProcessingTimeSessionWindows(60000), ProcessingTimeTrigger, SessionAggregate, PassThroughWindowFunction) -> Sink: Unnamed (1/1) (9ee57af7f96b318d16fb0784a693b481) switched from CREATED to SCHEDULED. 2021-05-10 14:09:00,928 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] 2021-05-10 14:09:00,939 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) 2021-05-10 14:09:00,947 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration 2021-05-10 14:09:00,949 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registering job manager [hidden email]://[hidden email]:6123/user/rpc/jobmanager_2 for job a63f806ba9a172b728395266a6dc41fe. 2021-05-10 14:09:01,009 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email]://[hidden email]:6123/user/rpc/jobmanager_2 for job a63f806ba9a172b728395266a6dc41fe. 2021-05-10 14:09:01,016 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000. 2021-05-10 14:09:01,018 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] and profile ResourceProfile{UNKNOWN} with allocation id be6a056136c6dec065af876bda1f6dd5 from resource manager. 2021-05-10 14:09:01,020 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job a63f806ba9a172b728395266a6dc41fe with allocation id be6a056136c6dec065af876bda1f6dd5. 2021-05-10 14:09:01,029 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}, current pending count: 1. 2021-05-10 14:09:01,035 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: [] 2021-05-10 14:09:01,414 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating new TaskManager pod with name franz-01-taskmanager-1-1 and resource <1728,1.0>. 2021-05-10 14:09:01,739 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod franz-01-taskmanager-1-1 is created. 2021-05-10 14:09:01,807 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new TaskManager pod: franz-01-taskmanager-1-1 2021-05-10 14:09:01,808 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker franz-01-taskmanager-1-1 with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}. Help appreciated. Thanks ! |
Pulling in Yang Wang who may shed some
light on the matter.
You could also have a look at
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Native-kubernetes-setup-failed-to-start-job-td39066.html;
while the issue was not actually resolved it may give some hints.
On 5/10/2021 4:40 PM, Valentin Wallyn
wrote:
|
It seems that the TaskManager pod could not resolve the JobManager address "franz-01.default", which is constructed in "k8s-service-name.namespace". I think you need to check whether the coreDNS is running normally in your K8s cluster. You could start a busybox pod on the same node with TaskManager and then do the "nslookup franz-01.default" to verify the dns resolution. Best, Yang Chesnay Schepler <[hidden email]> 于2021年5月11日周二 下午6:30写道:
|
Free forum by Nabble | Edit this page |