Flink on kubernetes: taskmanager error

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Flink on kubernetes: taskmanager error

vipul singh
Hello,

I am trying to run flink on a kubernetes cluster using minikube and kubectl. I am following this example, which runs a flink 1.2 cluster ok.

I am interested in running flink 1.5.1, but when I modify the flink version, I start to see these exceptions in taskmanager-controller logs. The exceptions are below:

2018-07-27 07:34:22,429 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.

2018-07-27 07:34:22,442 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.

2018-07-27 07:34:22,460 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.

2018-07-27 07:34:22,622 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'

2018-07-27 07:34:22,626 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to select the network interface and address to use by connecting to the leading JobManager.

2018-07-27 07:34:22,626 INFO  org.apache.flink.runtime.util.LeaderRetrievalUtils            - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics

2018-07-27 07:34:22,629 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Retrieved new target address taskmanager-controller-vncdz/172.17.0.7:6123.

2018-07-27 07:34:23,390 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:23,391 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:23,391 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:23,392 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:23,392 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:23,393 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:23,393 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:24,195 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:24,196 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:24,197 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:24,198 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:24,198 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:24,199 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:24,199 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:25,803 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:25,811 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:25,811 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:25,812 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:25,812 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:25,813 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:25,813 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:29,018 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Trying to connect to address taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:29,098 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:29,098 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:29,099 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:29,099 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:29,100 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/172.17.0.7': Connection refused (Connection refused)

2018-07-27 07:34:29,102 INFO  org.apache.flink.runtime.net.ConnectionUtils                  - Failed to connect from address '/127.0.0.1': Connection refused (Connection refused)

2018-07-27 07:34:32,628 WARN  org.apache.flink.runtime.net.ConnectionUtils                  - Could not connect to taskmanager-controller-vncdz/172.17.0.7:6123. Selecting a local address using heuristics.

2018-07-27 07:34:32,630 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - TaskManager will use hostname/address 'taskmanager-controller-vncdz' (172.17.0.7) for communication.

2018-07-27 07:34:32,663 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Starting AkkaRpcService at taskmanager-controller-vncdz:0.

2018-07-27 07:34:33,574 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started

2018-07-27 07:34:34,335 INFO  akka.remote.Remoting                                          - Starting remoting

2018-07-27 07:34:34,661 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://flink@taskmanager-controller-vncdz:39769]

2018-07-27 07:34:34,698 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics reporter configured, no metrics will be exposed/reported.

2018-07-27 07:34:34,710 INFO  org.apache.flink.runtime.blob.PermanentBlobCache              - Created BLOB cache storage directory /tmp/blobStore-376e1f5a-810b-4999-91eb-ca5292b50d12

2018-07-27 07:34:34,714 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /tmp/blobStore-fb08f586-2992-4d4a-9e75-ed501bbdc4e3

2018-07-27 07:34:34,718 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig [server address: taskmanager-controller-vncdz/172.17.0.7, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 2 (manual), number of client threads: 2 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]

2018-07-27 07:34:34,916 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Temporary file directory '/tmp': total 16 GB, usable 12 GB (75.00% usable)

2018-07-27 07:34:35,605 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 102 MB for network buffer pool (number of memory segments: 3278, bytes per segment: 32768).

2018-07-27 07:34:35,899 INFO  org.apache.flink.runtime.query.QueryableStateUtils            - Could not load Queryable State Client Proxy. Probable reason: flink-queryable-state-runtime is not in the classpath. To enable Queryable State, please move the flink-queryable-state-runtime jar from the opt to the lib folder.

2018-07-27 07:34:35,900 INFO  org.apache.flink.runtime.query.QueryableStateUtils            - Could not load Queryable State Server. Probable reason: flink-queryable-state-runtime is not in the classpath. To enable Queryable State, please move the flink-queryable-state-runtime jar from the opt to the lib folder.

2018-07-27 07:34:35,901 INFO  org.apache.flink.runtime.io.network.NetworkEnvironment        - Starting the network environment and its components.

2018-07-27 07:34:35,946 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 37 ms).

2018-07-27 07:34:35,988 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 42 ms). Listening on SocketAddress /172.17.0.7:41451.

2018-07-27 07:34:35,990 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Limiting managed memory to 0.7 of the currently free heap space (641 MB), memory will be allocated lazily.

2018-07-27 07:34:36,000 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager uses directory /tmp/flink-io-48184970-5e3d-4ae7-9ba4-40850532367a for spill files.

2018-07-27 07:34:36,008 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-adbfd785-de17-48ae-8677-cf360db1fac2

2018-07-27 07:34:36,199 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages have a max timeout of 10000 ms

2018-07-27 07:34:36,211 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 .

2018-07-27 07:34:36,226 INFO  org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job leader service.

2018-07-27 07:34:36,231 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager(00000000000000000000000000000000).

2018-07-27 07:34:36,513 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:36,513 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:34:36,520 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:34:47,228 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:47,233 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:34:47,234 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:34:57,255 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:34:57,255 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:34:57,256 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:35:07,274 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:35:07,276 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:35:07,276 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:35:17,294 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:35:17,300 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:35:17,300 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:35:27,315 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:35:27,316 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:35:27,318 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:35:37,340 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:35:37,341 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:35:37,343 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:35:47,364 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:35:47,365 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:35:47,365 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..

2018-07-27 07:35:57,385 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123

2018-07-27 07:35:57,387 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by: [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]

2018-07-27 07:35:57,387 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..




Could anyone point me to as to what is wrong? This is my taskmanager controller file.

Also could someone please point me to some other docs if they exist, about running flink 1.5 end to end on kubernetes.

Thanks,
Vipul