Could not resolve ResourceManager address in native kubernetes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Could not resolve ResourceManager address in native kubernetes

Valentin Wallyn
Hi,

I'm trying to use Flink on native kubernetes (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/) but I have an error even with the example from the documentation.

The job get submitted but stays in "created" status until it timeouts after 5 minutes. In the log of the task manager, I can see that the error is "Could not resolve ResourceManager address"

What can be the issue ?


Here are the logs :

> ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=franz-01

2021-05-10 16:05:00,392 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, localhost
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-05-10 16:05:00,396 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-05-10 16:05:00,432 INFO  org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - Could not load factory due to missing dependencies.
2021-05-10 16:05:02,680 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:02,690 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:02,699 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124
2021-05-10 16:05:02,700 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122
2021-05-10 16:05:02,760 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:05,440 INFO  org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create flink session cluster franz-01 successfully, JobManager Web Interface: http://xxx:8081



Task Manager logs

2021-05-10 14:09:05,463 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.framework.off-heap.size=134217728b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.network.max=134217730b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.network.min=134217730b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.framework.heap.size=134217728b
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.managed.size=536870920b
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.cpu.cores=1.0
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.task.heap.size=402653174b
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.task.off-heap.size=0b
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-metaspace.size=268435456b
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-overhead.max=201326592b
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-overhead.min=201326592b
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     --configDir
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     /opt/flink/conf
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Dtaskmanager.resource-id=franz-01-taskmanager-1-1
2021-05-10 14:09:05,471 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.off-heap.size=134217728b
2021-05-10 14:09:05,471 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-overhead.min=201326592b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Dweb.tmpdir=/tmp/flink-web-e60a7b21-4e2b-4b6c-a0ac-5b08816edcee
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-metaspace.size=268435456b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.heap.size=1073741824b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-overhead.max=201326592b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -  Classpath: /opt/flink/lib/flink-csv-1.12.2.jar:/opt/flink/lib/flink-json-1.12.2.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.12.2.jar:/opt/flink/lib/flink-table_2.12-1.12.2.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.12.2.jar:::
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - --------------------------------------------------------------------------------
2021-05-10 14:09:05,475 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - Registered UNIX signal handlers for [TERM, HUP, INT]
2021-05-10 14:09:05,510 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: blob.server.port, 6124
2021-05-10 14:09:05,511 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-05-10 14:09:05,511 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint
2021-05-10 14:09:05,513 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-05-10 14:09:05,514 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, franz-01.default
2021-05-10 14:09:05,514 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.target, kubernetes-session
2021-05-10 14:09:05,515 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.cluster-id, franz-01
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.rpc.port, 6122
2021-05-10 14:09:05,517 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
2021-05-10 14:09:05,517 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-05-10 14:09:05,519 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-05-10 14:09:05,658 INFO  org.apache.flink.core.fs.FileSystem                          [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2021-05-10 14:09:05,733 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2021-05-10 14:09:05,738 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-3361029581556571704.conf.
2021-05-10 14:09:05,744 INFO  org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2021-05-10 14:09:05,811 INFO  org.apache.flink.configuration.Configuration                 [] - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2021-05-10 14:09:05,855 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-05-10 14:09:05,855 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-05-10 14:09:26,116 WARN  org.apache.flink.runtime.net.ConnectionUtils                 [] - Could not connect to franz-01.default:6123. Selecting a local address using heuristics.
2021-05-10 14:09:26,116 WARN  org.apache.flink.runtime.net.ConnectionUtils                 [] - Could not find any IPv4 address that is not loopback or link-local. Using localhost address.
2021-05-10 14:09:26,117 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - TaskManager will use hostname/address 'franz-01-taskmanager-1-1' (10.2.2.37) for communication.
2021-05-10 14:09:26,136 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address 10.2.2.37:6122, bind address 0.0.0.0:6122.
2021-05-10 14:09:27,212 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2021-05-10 14:09:27,283 INFO  akka.remote.Remoting                                         [] - Starting remoting
2021-05-10 14:09:27,586 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink@10.2.2.37:6122]
2021-05-10 14:09:27,730 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink@10.2.2.37:6122
2021-05-10 14:09:27,781 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - No metrics reporter configured, no metrics will be exposed/reported.
2021-05-10 14:09:27,786 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address 10.2.2.37:0, bind address 0.0.0.0:0.
2021-05-10 14:09:27,814 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2021-05-10 14:09:27,819 INFO  akka.remote.Remoting                                         [] - Starting remoting
2021-05-10 14:09:27,881 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.2.2.37:39177]
2021-05-10 14:09:27,895 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink-metrics@10.2.2.37:39177
2021-05-10 14:09:27,916 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService_franz-01-taskmanager-1-1 .
2021-05-10 14:09:27,931 INFO  org.apache.flink.runtime.blob.PermanentBlobCache             [] - Created BLOB cache storage directory /tmp/blobStore-16255e13-c39a-442f-853a-cd1e331e7325
2021-05-10 14:09:27,934 INFO  org.apache.flink.runtime.blob.TransientBlobCache             [] - Created BLOB cache storage directory /tmp/blobStore-5ac02374-808a-4529-b80c-088dbeac2711
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Starting TaskManager with ResourceID: franz-01-taskmanager-1-1
2021-05-10 14:09:27,990 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices    [] - Temporary file directory '/tmp': total 48 GB, usable 38 GB (79.17% usable)
2021-05-10 14:09:27,997 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] - FileChannelManager uses directory /tmp/flink-io-c08780a7-90bd-4259-8f51-8a24d95c21df for spill files.
2021-05-10 14:09:28,059 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig        [] - NettyConfig [server address: /0.0.0.0, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: AUTO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2021-05-10 14:09:28,063 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] - FileChannelManager uses directory /tmp/flink-netty-shuffle-209ae6cc-6fd5-4c9e-b6df-acc675a6995c for spill files.
2021-05-10 14:09:28,578 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool [] - Allocated 128 MB for network buffer pool (number of memory segments: 4096, bytes per segment: 32768).
2021-05-10 14:09:28,594 INFO  org.apache.flink.runtime.io.network.NettyShuffleEnvironment  [] - Starting the network environment and its components.
2021-05-10 14:09:28,789 INFO  org.apache.flink.runtime.io.network.netty.NettyClient        [] - Transport type 'auto': using EPOLL.
2021-05-10 14:09:28,791 INFO  org.apache.flink.runtime.io.network.netty.NettyClient        [] - Successful initialization (took 196 ms).
2021-05-10 14:09:28,796 INFO  org.apache.flink.runtime.io.network.netty.NettyServer        [] - Transport type 'auto': using EPOLL.
2021-05-10 14:09:28,892 INFO  org.apache.flink.runtime.io.network.netty.NettyServer        [] - Successful initialization (took 99 ms). Listening on SocketAddress /0:0:0:0:0:0:0:0%0:40399.
2021-05-10 14:09:28,894 INFO  org.apache.flink.runtime.taskexecutor.KvStateService         [] - Starting the kvState service and its components.
2021-05-10 14:09:28,979 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 .
2021-05-10 14:09:29,002 INFO  org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start job leader service.
2021-05-10 14:09:29,005 INFO  org.apache.flink.runtime.filecache.FileCache                 [] - User file cache uses directory /tmp/flink-dist-cache-bc340200-15c9-4d0a-950a-f43469bdb58d
2021-05-10 14:09:29,055 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Connecting to ResourceManager akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2021-05-10 14:09:29,276 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default]
2021-05-10 14:09:29,289 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.
2021-05-10 14:09:49,314 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.
2021-05-10 14:09:59,325 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:09:59,327 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.
2021-05-10 14:10:19,365 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.
2021-05-10 14:10:29,363 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:10:29,385 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.
2021-05-10 14:10:49,425 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.
2021-05-10 14:10:59,423 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://[hidden email]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[hidden email]:6123]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:10:59,445 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*.


Job Manager logs

2021-05-10 14:09:00,393 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received JobGraph submission a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job).
2021-05-10 14:09:00,395 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting job a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job).
2021-05-10 14:09:00,524 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 .
2021-05-10 14:09:00,537 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Initializing job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,612 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,665 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running initialization on master for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,666 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Successfully ran initialization on master in 0 ms.
2021-05-10 14:09:00,707 INFO  org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 15 ms
2021-05-10 14:09:00,742 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
2021-05-10 14:09:00,823 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - No checkpoint found during restore.
2021-05-10 14:09:00,830 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@43519311 for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,844 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      [] - JobManager runner for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://[hidden email]:6123/user/rpc/jobmanager_2.
2021-05-10 14:09:00,848 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting execution of job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) under job master id 00000000000000000000000000000000.
2021-05-10 14:09:00,851 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
2021-05-10 14:09:00,852 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) switched from state CREATED to RUNNING.
2021-05-10 14:09:00,912 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: Custom Source -> Filter -> Timestamps/Watermarks (1/1) (b10791bc97d1d772bd443abd92bf32c0) switched from CREATED to SCHEDULED.
2021-05-10 14:09:00,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Window(ProcessingTimeSessionWindows(60000), ProcessingTimeTrigger, SessionAggregate, PassThroughWindowFunction) -> Sink: Unnamed (1/1) (9ee57af7f96b318d16fb0784a693b481) switched from CREATED to SCHEDULED.
2021-05-10 14:09:00,928 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}]
2021-05-10 14:09:00,939 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Connecting to ResourceManager akka.tcp://[hidden email]:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
2021-05-10 14:09:00,947 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Resolved ResourceManager address, beginning registration
2021-05-10 14:09:00,949 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registering job manager [hidden email]://[hidden email]:6123/user/rpc/jobmanager_2 for job a63f806ba9a172b728395266a6dc41fe.
2021-05-10 14:09:01,009 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email]://[hidden email]:6123/user/rpc/jobmanager_2 for job a63f806ba9a172b728395266a6dc41fe.
2021-05-10 14:09:01,016 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000.
2021-05-10 14:09:01,018 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] and profile ResourceProfile{UNKNOWN} with allocation id be6a056136c6dec065af876bda1f6dd5 from resource manager.
2021-05-10 14:09:01,020 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job a63f806ba9a172b728395266a6dc41fe with allocation id be6a056136c6dec065af876bda1f6dd5.
2021-05-10 14:09:01,029 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}, current pending count: 1.
2021-05-10 14:09:01,035 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:01,414 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Creating new TaskManager pod with name franz-01-taskmanager-1-1 and resource <1728,1.0>.
2021-05-10 14:09:01,739 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod franz-01-taskmanager-1-1 is created.
2021-05-10 14:09:01,807 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Received new TaskManager pod: franz-01-taskmanager-1-1
2021-05-10 14:09:01,808 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker franz-01-taskmanager-1-1 with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}.


Help appreciated. Thanks !

Reply | Threaded
Open this post in threaded view
|

Re: Could not resolve ResourceManager address in native kubernetes

Chesnay Schepler
Pulling in Yang Wang who may shed some light on the matter.

You could also have a look at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Native-kubernetes-setup-failed-to-start-job-td39066.html; while the issue was not actually resolved it may give some hints.

On 5/10/2021 4:40 PM, Valentin Wallyn wrote:
Hi,

I'm trying to use Flink on native kubernetes (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/) but I have an error even with the example from the documentation.

The job get submitted but stays in "created" status until it timeouts after 5 minutes. In the log of the task manager, I can see that the error is "Could not resolve ResourceManager address"

What can be the issue ?


Here are the logs :

> ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=franz-01

2021-05-10 16:05:00,392 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, localhost
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-05-10 16:05:00,396 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-05-10 16:05:00,432 INFO  org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - Could not load factory due to missing dependencies.
2021-05-10 16:05:02,680 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:02,690 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:02,699 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124
2021-05-10 16:05:02,700 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122
2021-05-10 16:05:02,760 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:05,440 INFO  org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create flink session cluster franz-01 successfully, JobManager Web Interface: http://xxx:8081



Task Manager logs

2021-05-10 14:09:05,463 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.framework.off-heap.size=134217728b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.network.max=134217730b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.network.min=134217730b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.framework.heap.size=134217728b
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.managed.size=536870920b
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.cpu.cores=1.0
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.task.heap.size=402653174b
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.task.off-heap.size=0b
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-metaspace.size=268435456b
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-overhead.max=201326592b
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-overhead.min=201326592b
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     --configDir
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     /opt/flink/conf
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Dtaskmanager.resource-id=franz-01-taskmanager-1-1
2021-05-10 14:09:05,471 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.off-heap.size=134217728b
2021-05-10 14:09:05,471 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-overhead.min=201326592b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Dweb.tmpdir=/tmp/flink-web-e60a7b21-4e2b-4b6c-a0ac-5b08816edcee
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-metaspace.size=268435456b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.heap.size=1073741824b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-overhead.max=201326592b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -  Classpath: /opt/flink/lib/flink-csv-1.12.2.jar:/opt/flink/lib/flink-json-1.12.2.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.12.2.jar:/opt/flink/lib/flink-table_2.12-1.12.2.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.12.2.jar:::
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - --------------------------------------------------------------------------------
2021-05-10 14:09:05,475 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - Registered UNIX signal handlers for [TERM, HUP, INT]
2021-05-10 14:09:05,510 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: blob.server.port, 6124
2021-05-10 14:09:05,511 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-05-10 14:09:05,511 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint
2021-05-10 14:09:05,513 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-05-10 14:09:05,514 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, franz-01.default
2021-05-10 14:09:05,514 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.target, kubernetes-session
2021-05-10 14:09:05,515 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.cluster-id, franz-01
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.rpc.port, 6122
2021-05-10 14:09:05,517 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
2021-05-10 14:09:05,517 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-05-10 14:09:05,519 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-05-10 14:09:05,658 INFO  org.apache.flink.core.fs.FileSystem                          [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2021-05-10 14:09:05,733 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2021-05-10 14:09:05,738 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-3361029581556571704.conf.
2021-05-10 14:09:05,744 INFO  org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2021-05-10 14:09:05,811 INFO  org.apache.flink.configuration.Configuration                 [] - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2021-05-10 14:09:05,855 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-05-10 14:09:05,855 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-05-10 14:09:26,116 WARN  org.apache.flink.runtime.net.ConnectionUtils                 [] - Could not connect to franz-01.default:6123. Selecting a local address using heuristics.
2021-05-10 14:09:26,116 WARN  org.apache.flink.runtime.net.ConnectionUtils                 [] - Could not find any IPv4 address that is not loopback or link-local. Using localhost address.
2021-05-10 14:09:26,117 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - TaskManager will use hostname/address 'franz-01-taskmanager-1-1' (10.2.2.37) for communication.
2021-05-10 14:09:26,136 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address 10.2.2.37:6122, bind address 0.0.0.0:6122.
2021-05-10 14:09:27,212 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2021-05-10 14:09:27,283 INFO  akka.remote.Remoting                                         [] - Starting remoting
2021-05-10 14:09:27,586 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink@10.2.2.37:6122]
2021-05-10 14:09:27,730 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink@10.2.2.37:6122
2021-05-10 14:09:27,781 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - No metrics reporter configured, no metrics will be exposed/reported.
2021-05-10 14:09:27,786 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address 10.2.2.37:0, bind address 0.0.0.0:0.
2021-05-10 14:09:27,814 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2021-05-10 14:09:27,819 INFO  akka.remote.Remoting                                         [] - Starting remoting
2021-05-10 14:09:27,881 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.2.2.37:39177]
2021-05-10 14:09:27,895 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink-metrics@10.2.2.37:39177
2021-05-10 14:09:27,916 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService_franz-01-taskmanager-1-1 .
2021-05-10 14:09:27,931 INFO  org.apache.flink.runtime.blob.PermanentBlobCache             [] - Created BLOB cache storage directory /tmp/blobStore-16255e13-c39a-442f-853a-cd1e331e7325
2021-05-10 14:09:27,934 INFO  org.apache.flink.runtime.blob.TransientBlobCache             [] - Created BLOB cache storage directory /tmp/blobStore-5ac02374-808a-4529-b80c-088dbeac2711
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Starting TaskManager with ResourceID: franz-01-taskmanager-1-1
2021-05-10 14:09:27,990 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices    [] - Temporary file directory '/tmp': total 48 GB, usable 38 GB (79.17% usable)
2021-05-10 14:09:27,997 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] - FileChannelManager uses directory /tmp/flink-io-c08780a7-90bd-4259-8f51-8a24d95c21df for spill files.
2021-05-10 14:09:28,059 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig        [] - NettyConfig [server address: /0.0.0.0, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: AUTO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2021-05-10 14:09:28,063 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] - FileChannelManager uses directory /tmp/flink-netty-shuffle-209ae6cc-6fd5-4c9e-b6df-acc675a6995c for spill files.
2021-05-10 14:09:28,578 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool [] - Allocated 128 MB for network buffer pool (number of memory segments: 4096, bytes per segment: 32768).
2021-05-10 14:09:28,594 INFO  org.apache.flink.runtime.io.network.NettyShuffleEnvironment  [] - Starting the network environment and its components.
2021-05-10 14:09:28,789 INFO  org.apache.flink.runtime.io.network.netty.NettyClient        [] - Transport type 'auto': using EPOLL.
2021-05-10 14:09:28,791 INFO  org.apache.flink.runtime.io.network.netty.NettyClient        [] - Successful initialization (took 196 ms).
2021-05-10 14:09:28,796 INFO  org.apache.flink.runtime.io.network.netty.NettyServer        [] - Transport type 'auto': using EPOLL.
2021-05-10 14:09:28,892 INFO  org.apache.flink.runtime.io.network.netty.NettyServer        [] - Successful initialization (took 99 ms). Listening on SocketAddress /0:0:0:0:0:0:0:0%0:40399.
2021-05-10 14:09:28,894 INFO  org.apache.flink.runtime.taskexecutor.KvStateService         [] - Starting the kvState service and its components.
2021-05-10 14:09:28,979 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 .
2021-05-10 14:09:29,002 INFO  org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start job leader service.
2021-05-10 14:09:29,005 INFO  org.apache.flink.runtime.filecache.FileCache                 [] - User file cache uses directory /tmp/flink-dist-cache-bc340200-15c9-4d0a-950a-f43469bdb58d
2021-05-10 14:09:29,055 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Connecting to ResourceManager [hidden email](00000000000000000000000000000000).
2021-05-10 14:09:29,276 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default]
2021-05-10 14:09:29,289 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:09:49,314 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:09:59,325 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:09:59,327 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:19,365 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:29,363 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:10:29,385 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:49,425 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:59,423 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:10:59,445 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].


Job Manager logs

2021-05-10 14:09:00,393 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received JobGraph submission a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job).
2021-05-10 14:09:00,395 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting job a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job).
2021-05-10 14:09:00,524 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 .
2021-05-10 14:09:00,537 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Initializing job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,612 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,665 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running initialization on master for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,666 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Successfully ran initialization on master in 0 ms.
2021-05-10 14:09:00,707 INFO  org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 15 ms
2021-05-10 14:09:00,742 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
2021-05-10 14:09:00,823 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - No checkpoint found during restore.
2021-05-10 14:09:00,830 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@43519311 for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,844 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      [] - JobManager runner for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at [hidden email].
2021-05-10 14:09:00,848 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting execution of job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) under job master id 00000000000000000000000000000000.
2021-05-10 14:09:00,851 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
2021-05-10 14:09:00,852 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) switched from state CREATED to RUNNING.
2021-05-10 14:09:00,912 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: Custom Source -> Filter -> Timestamps/Watermarks (1/1) (b10791bc97d1d772bd443abd92bf32c0) switched from CREATED to SCHEDULED.
2021-05-10 14:09:00,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Window(ProcessingTimeSessionWindows(60000), ProcessingTimeTrigger, SessionAggregate, PassThroughWindowFunction) -> Sink: Unnamed (1/1) (9ee57af7f96b318d16fb0784a693b481) switched from CREATED to SCHEDULED.
2021-05-10 14:09:00,928 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}]
2021-05-10 14:09:00,939 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Connecting to ResourceManager [hidden email](00000000000000000000000000000000)
2021-05-10 14:09:00,947 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Resolved ResourceManager address, beginning registration
2021-05-10 14:09:00,949 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registering job manager [hidden email] for job a63f806ba9a172b728395266a6dc41fe.
2021-05-10 14:09:01,009 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email] for job a63f806ba9a172b728395266a6dc41fe.
2021-05-10 14:09:01,016 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000.
2021-05-10 14:09:01,018 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] and profile ResourceProfile{UNKNOWN} with allocation id be6a056136c6dec065af876bda1f6dd5 from resource manager.
2021-05-10 14:09:01,020 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job a63f806ba9a172b728395266a6dc41fe with allocation id be6a056136c6dec065af876bda1f6dd5.
2021-05-10 14:09:01,029 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}, current pending count: 1.
2021-05-10 14:09:01,035 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:01,414 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Creating new TaskManager pod with name franz-01-taskmanager-1-1 and resource <1728,1.0>.
2021-05-10 14:09:01,739 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod franz-01-taskmanager-1-1 is created.
2021-05-10 14:09:01,807 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Received new TaskManager pod: franz-01-taskmanager-1-1
2021-05-10 14:09:01,808 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker franz-01-taskmanager-1-1 with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}.


Help appreciated. Thanks !


Reply | Threaded
Open this post in threaded view
|

Re: Could not resolve ResourceManager address in native kubernetes

Yang Wang
It seems that the TaskManager pod could not resolve the JobManager address "franz-01.default", which is constructed in "k8s-service-name.namespace".
I think you need to check whether the coreDNS is running normally in your K8s cluster. You could start a busybox pod on the same node with
TaskManager and then do the "nslookup franz-01.default" to verify the dns resolution.

Best,
Yang

Chesnay Schepler <[hidden email]> 于2021年5月11日周二 下午6:30写道:
Pulling in Yang Wang who may shed some light on the matter.

You could also have a look at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Native-kubernetes-setup-failed-to-start-job-td39066.html; while the issue was not actually resolved it may give some hints.

On 5/10/2021 4:40 PM, Valentin Wallyn wrote:
Hi,

I'm trying to use Flink on native kubernetes (https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/) but I have an error even with the example from the documentation.

The job get submitted but stays in "created" status until it timeouts after 5 minutes. In the log of the task manager, I can see that the error is "Could not resolve ResourceManager address"

What can be the issue ?


Here are the logs :

> ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=franz-01

2021-05-10 16:05:00,392 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, localhost
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-05-10 16:05:00,395 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-05-10 16:05:00,396 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-05-10 16:05:00,432 INFO  org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - Could not load factory due to missing dependencies.
2021-05-10 16:05:02,680 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:02,690 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (172.800mb (181193935 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:02,699 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration blob.server.port will be set to 6124
2021-05-10 16:05:02,700 INFO  org.apache.flink.kubernetes.utils.KubernetesUtils            [] - Kubernetes deployment requires a fixed port. Configuration taskmanager.rpc.port will be set to 6122
2021-05-10 16:05:02,760 INFO  org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
2021-05-10 16:05:05,440 INFO  org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create flink session cluster franz-01 successfully, JobManager Web Interface: http://xxx:8081



Task Manager logs

2021-05-10 14:09:05,463 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.framework.off-heap.size=134217728b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.network.max=134217730b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.network.min=134217730b
2021-05-10 14:09:05,464 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.framework.heap.size=134217728b
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.managed.size=536870920b
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.cpu.cores=1.0
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,465 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.task.heap.size=402653174b
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.task.off-heap.size=0b
2021-05-10 14:09:05,466 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-metaspace.size=268435456b
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,467 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-overhead.max=201326592b
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -D
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     taskmanager.memory.jvm-overhead.min=201326592b
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     --configDir
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     /opt/flink/conf
2021-05-10 14:09:05,470 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Dtaskmanager.resource-id=franz-01-taskmanager-1-1
2021-05-10 14:09:05,471 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.off-heap.size=134217728b
2021-05-10 14:09:05,471 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-overhead.min=201326592b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Dweb.tmpdir=/tmp/flink-web-e60a7b21-4e2b-4b6c-a0ac-5b08816edcee
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-metaspace.size=268435456b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.heap.size=1073741824b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -     -Djobmanager.memory.jvm-overhead.max=201326592b
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] -  Classpath: /opt/flink/lib/flink-csv-1.12.2.jar:/opt/flink/lib/flink-json-1.12.2.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.12.2.jar:/opt/flink/lib/flink-table_2.12-1.12.2.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.12.2.jar:::
2021-05-10 14:09:05,472 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - --------------------------------------------------------------------------------
2021-05-10 14:09:05,475 INFO  org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner [] - Registered UNIX signal handlers for [TERM, HUP, INT]
2021-05-10 14:09:05,510 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: blob.server.port, 6124
2021-05-10 14:09:05,511 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-05-10 14:09:05,511 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint
2021-05-10 14:09:05,513 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-05-10 14:09:05,514 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, franz-01.default
2021-05-10 14:09:05,514 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.target, kubernetes-session
2021-05-10 14:09:05,515 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.cluster-id, franz-01
2021-05-10 14:09:05,516 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.rpc.port, 6122
2021-05-10 14:09:05,517 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
2021-05-10 14:09:05,517 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-05-10 14:09:05,519 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-05-10 14:09:05,658 INFO  org.apache.flink.core.fs.FileSystem                          [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2021-05-10 14:09:05,733 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2021-05-10 14:09:05,738 INFO  org.apache.flink.runtime.security.modules.JaasModule         [] - Jaas file will be created as /tmp/jaas-3361029581556571704.conf.
2021-05-10 14:09:05,744 INFO  org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2021-05-10 14:09:05,811 INFO  org.apache.flink.configuration.Configuration                 [] - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2021-05-10 14:09:05,855 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - Trying to select the network interface and address to use by connecting to the leading JobManager.
2021-05-10 14:09:05,855 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils           [] - TaskManager will try to connect for PT10S before falling back to heuristics
2021-05-10 14:09:26,116 WARN  org.apache.flink.runtime.net.ConnectionUtils                 [] - Could not connect to franz-01.default:6123. Selecting a local address using heuristics.
2021-05-10 14:09:26,116 WARN  org.apache.flink.runtime.net.ConnectionUtils                 [] - Could not find any IPv4 address that is not loopback or link-local. Using localhost address.
2021-05-10 14:09:26,117 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - TaskManager will use hostname/address 'franz-01-taskmanager-1-1' (10.2.2.37) for communication.
2021-05-10 14:09:26,136 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address 10.2.2.37:6122, bind address 0.0.0.0:6122.
2021-05-10 14:09:27,212 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2021-05-10 14:09:27,283 INFO  akka.remote.Remoting                                         [] - Starting remoting
2021-05-10 14:09:27,586 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink@10.2.2.37:6122]
2021-05-10 14:09:27,730 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink@10.2.2.37:6122
2021-05-10 14:09:27,781 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl          [] - No metrics reporter configured, no metrics will be exposed/reported.
2021-05-10 14:09:27,786 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Trying to start actor system, external address 10.2.2.37:0, bind address 0.0.0.0:0.
2021-05-10 14:09:27,814 INFO  akka.event.slf4j.Slf4jLogger                                 [] - Slf4jLogger started
2021-05-10 14:09:27,819 INFO  akka.remote.Remoting                                         [] - Starting remoting
2021-05-10 14:09:27,881 INFO  akka.remote.Remoting                                         [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.2.2.37:39177]
2021-05-10 14:09:27,895 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils        [] - Actor system started at akka.tcp://flink-metrics@10.2.2.37:39177
2021-05-10 14:09:27,916 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService_franz-01-taskmanager-1-1 .
2021-05-10 14:09:27,931 INFO  org.apache.flink.runtime.blob.PermanentBlobCache             [] - Created BLOB cache storage directory /tmp/blobStore-16255e13-c39a-442f-853a-cd1e331e7325
2021-05-10 14:09:27,934 INFO  org.apache.flink.runtime.blob.TransientBlobCache             [] - Created BLOB cache storage directory /tmp/blobStore-5ac02374-808a-4529-b80c-088dbeac2711
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:27,955 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Starting TaskManager with ResourceID: franz-01-taskmanager-1-1
2021-05-10 14:09:27,990 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices    [] - Temporary file directory '/tmp': total 48 GB, usable 38 GB (79.17% usable)
2021-05-10 14:09:27,997 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] - FileChannelManager uses directory /tmp/flink-io-c08780a7-90bd-4259-8f51-8a24d95c21df for spill files.
2021-05-10 14:09:28,059 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig        [] - NettyConfig [server address: /0.0.0.0, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: AUTO, number of server threads: 1 (manual), number of client threads: 1 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2021-05-10 14:09:28,063 INFO  org.apache.flink.runtime.io.disk.FileChannelManagerImpl      [] - FileChannelManager uses directory /tmp/flink-netty-shuffle-209ae6cc-6fd5-4c9e-b6df-acc675a6995c for spill files.
2021-05-10 14:09:28,578 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool [] - Allocated 128 MB for network buffer pool (number of memory segments: 4096, bytes per segment: 32768).
2021-05-10 14:09:28,594 INFO  org.apache.flink.runtime.io.network.NettyShuffleEnvironment  [] - Starting the network environment and its components.
2021-05-10 14:09:28,789 INFO  org.apache.flink.runtime.io.network.netty.NettyClient        [] - Transport type 'auto': using EPOLL.
2021-05-10 14:09:28,791 INFO  org.apache.flink.runtime.io.network.netty.NettyClient        [] - Successful initialization (took 196 ms).
2021-05-10 14:09:28,796 INFO  org.apache.flink.runtime.io.network.netty.NettyServer        [] - Transport type 'auto': using EPOLL.
2021-05-10 14:09:28,892 INFO  org.apache.flink.runtime.io.network.netty.NettyServer        [] - Successful initialization (took 99 ms). Listening on SocketAddress /0:0:0:0:0:0:0:0%0:40399.
2021-05-10 14:09:28,894 INFO  org.apache.flink.runtime.taskexecutor.KvStateService         [] - Starting the kvState service and its components.
2021-05-10 14:09:28,979 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 .
2021-05-10 14:09:29,002 INFO  org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start job leader service.
2021-05-10 14:09:29,005 INFO  org.apache.flink.runtime.filecache.FileCache                 [] - User file cache uses directory /tmp/flink-dist-cache-bc340200-15c9-4d0a-950a-f43469bdb58d
2021-05-10 14:09:29,055 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Connecting to ResourceManager [hidden email](00000000000000000000000000000000).
2021-05-10 14:09:29,276 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default]
2021-05-10 14:09:29,289 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:09:49,314 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:09:59,325 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:09:59,327 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:19,365 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:29,363 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:10:29,385 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:49,425 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].
2021-05-10 14:10:59,423 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [[hidden email]] has failed, address is now gated for [50] ms. Reason: [Association failed with [[hidden email]]] Caused by: [java.net.UnknownHostException: franz-01.default: Temporary failure in name resolution]
2021-05-10 14:10:59,445 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address [hidden email], retrying in 10000 ms: Could not connect to rpc endpoint under address [hidden email].


Job Manager logs

2021-05-10 14:09:00,393 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received JobGraph submission a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job).
2021-05-10 14:09:00,395 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting job a63f806ba9a172b728395266a6dc41fe (Flink Streaming Job).
2021-05-10 14:09:00,524 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 .
2021-05-10 14:09:00,537 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Initializing job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,612 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,665 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running initialization on master for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,666 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Successfully ran initialization on master in 0 ms.
2021-05-10 14:09:00,707 INFO  org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 15 ms
2021-05-10 14:09:00,742 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880)
2021-05-10 14:09:00,823 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - No checkpoint found during restore.
2021-05-10 14:09:00,830 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@43519311 for Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe).
2021-05-10 14:09:00,844 INFO  org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      [] - JobManager runner for job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at [hidden email].
2021-05-10 14:09:00,848 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting execution of job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) under job master id 00000000000000000000000000000000.
2021-05-10 14:09:00,851 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
2021-05-10 14:09:00,852 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job Flink Streaming Job (a63f806ba9a172b728395266a6dc41fe) switched from state CREATED to RUNNING.
2021-05-10 14:09:00,912 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: Custom Source -> Filter -> Timestamps/Watermarks (1/1) (b10791bc97d1d772bd443abd92bf32c0) switched from CREATED to SCHEDULED.
2021-05-10 14:09:00,913 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Window(ProcessingTimeSessionWindows(60000), ProcessingTimeTrigger, SessionAggregate, PassThroughWindowFunction) -> Sink: Unnamed (1/1) (9ee57af7f96b318d16fb0784a693b481) switched from CREATED to SCHEDULED.
2021-05-10 14:09:00,928 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}]
2021-05-10 14:09:00,939 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Connecting to ResourceManager [hidden email](00000000000000000000000000000000)
2021-05-10 14:09:00,947 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Resolved ResourceManager address, beginning registration
2021-05-10 14:09:00,949 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registering job manager [hidden email] for job a63f806ba9a172b728395266a6dc41fe.
2021-05-10 14:09:01,009 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email] for job a63f806ba9a172b728395266a6dc41fe.
2021-05-10 14:09:01,016 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000.
2021-05-10 14:09:01,018 INFO  org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting new slot [SlotRequestId{045786d740e63f4a986dc2024be7b3fc}] and profile ResourceProfile{UNKNOWN} with allocation id be6a056136c6dec065af876bda1f6dd5 from resource manager.
2021-05-10 14:09:01,020 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job a63f806ba9a172b728395266a6dc41fe with allocation id be6a056136c6dec065af876bda1f6dd5.
2021-05-10 14:09:01,029 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}, current pending count: 1.
2021-05-10 14:09:01,035 INFO  org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []
2021-05-10 14:09:01,414 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Creating new TaskManager pod with name franz-01-taskmanager-1-1 and resource <1728,1.0>.
2021-05-10 14:09:01,739 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod franz-01-taskmanager-1-1 is created.
2021-05-10 14:09:01,807 INFO  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Received new TaskManager pod: franz-01-taskmanager-1-1
2021-05-10 14:09:01,808 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker franz-01-taskmanager-1-1 with resource spec WorkerResourceSpec {cpuCores=1.0, taskHeapSize=384.000mb (402653174 bytes), taskOffHeapSize=0 bytes, networkMemSize=128.000mb (134217730 bytes), managedMemSize=512.000mb (536870920 bytes)}.


Help appreciated. Thanks !