Hello: I am trying to set up a standalone flink cluster (1.7.1) and I'm getting a very similar error as the user reported in this thread. However, I believe the root cause should be different -- as I tried start job manager using both start-cluster.sh and jobmanager.sh but both of them failed with the same error. The error I got is on task manager (flink-worker1) is similar to the following: 6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. 2019-03-12 07:39:42,884 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. 2019-03-12 07:39:52,901 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. 2019-03-12 07:40:02,925 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. 2019-03-12 07:40:12,939 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. 2019-03-12 07:40:22,963 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. 2019-03-12 07:40:32,978 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.0.0.6:6123/user/resourcemanager.. But the job manager seems to start up ok: 2019-03-12 07:38:36,643 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@10.0.0.6:6123] 2019-03-12 07:38:36,659 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@10.0.0.6:6123 2019-03-12 07:38:36,690 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory C:\cygwin64\tmp\blobStore-85b28100-fa08-4488-9f79-d0d712f34733 2019-03-12 07:38:36,690 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:54072 - max concurrent requests: 50 - max backlog: 1000 2019-03-12 07:38:36,705 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported. 2019-03-12 07:38:36,721 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at 10.0.0.6:0 2019-03-12 07:38:36,737 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-03-12 07:38:36,752 INFO akka.remote.Remoting - Starting remoting 2019-03-12 07:38:36,768 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.0.0.6:54085] 2019-03-12 07:38:36,768 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink-metrics@10.0.0.6:54085 2019-03-12 07:38:36,784 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory C:\cygwin64\tmp\executionGraphStore-550bff8d-314e-4a04-b10e-93bdc7af80c6, expiration time 3600000, maximum cache size 52428800 bytes. 2019-03-12 07:38:36,815 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory C:\cygwin64\tmp\blobStore-608a5134-9f0d-44dd-8e3d-d9fbe4185d21 2019-03-12 07:38:36,830 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory C:\cygwin64\tmp\flink-web-2d9712e2-54cb-428a-a27a-826fa2214dad\flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available. 2019-03-12 07:38:36,830 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory C:\cygwin64\tmp\flink-web-2d9712e2-54cb-428a-a27a-826fa2214dad\flink-web-upload for file uploads. 2019-03-12 07:38:36,830 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint. 2019-03-12 07:38:37,065 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set. 2019-03-12 07:38:37,065 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (deprecated keys: [jobmanager.web.log.path])'. 2019-03-12 07:38:38,034 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at 10.0.0.6:8081 2019-03-12 07:38:38,034 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://10.0.0.6:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000 2019-03-12 07:38:38,034 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://10.0.0.6:8081. 2019-03-12 07:38:38,096 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager . 2019-03-12 07:38:38,112 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher . 2019-03-12 07:38:38,190 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - ResourceManager akka.tcp://flink@10.0.0.6:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000 2019-03-12 07:38:38,190 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager. 2019-03-12 07:38:38,206 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink@10.0.0.6:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000 2019-03-12 07:38:38,221 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs. 2019-03-12 07:44:20,564 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [/10.0.0.7:51057] failed with java.io.IOException: An existing connection was forcibly closed by the remote host 2019-03-12 07:44:20,564 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-worker1:50978] has failed, address is now gated for [50] ms. Reason: [Disassociated] Interestingly, the worker node (flink-worker1) never seems to connect to the jobmanager since it keeps retrying. But when I force the task manager to close, job manager reports an error at the end saying the association has failed. For some reason, none of the job manager managed to connect even though port 6123 on the job manager is open and listening. Any suggestion will be appreciated. Thanks! Le |
Free forum by Nabble | Edit this page |