I am trying to set up a standalone flink cluster (1.7.1) and I'm getting a very similar error as the user reported in
6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:39:42,884 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:39:52,901 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:02,925 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:12,939 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:22,963 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:40:32,978 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager..
2019-03-12 07:38:36,643 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://
flink@10.0.0.6:6123]
2019-03-12 07:38:36,659 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://
flink@10.0.0.6:61232019-03-12 07:38:36,690 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory C:\cygwin64\tmp\blobStore-85b28100-fa08-4488-9f79-d0d712f34733
2019-03-12 07:38:36,690 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at
0.0.0.0:54072 - max concurrent requests: 50 - max backlog: 1000
2019-03-12 07:38:36,705 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported.
2019-03-12 07:38:36,721 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at
10.0.0.6:02019-03-12 07:38:36,737 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
2019-03-12 07:38:36,752 INFO akka.remote.Remoting - Starting remoting
2019-03-12 07:38:36,768 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://
flink-metrics@10.0.0.6:54085]
2019-03-12 07:38:36,768 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://
flink-metrics@10.0.0.6:540852019-03-12 07:38:36,784 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory C:\cygwin64\tmp\executionGraphStore-550bff8d-314e-4a04-b10e-93bdc7af80c6, expiration time 3600000, maximum cache size 52428800 bytes.
2019-03-12 07:38:36,815 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory C:\cygwin64\tmp\blobStore-608a5134-9f0d-44dd-8e3d-d9fbe4185d21
2019-03-12 07:38:36,830 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory C:\cygwin64\tmp\flink-web-2d9712e2-54cb-428a-a27a-826fa2214dad\flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-03-12 07:38:36,830 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory C:\cygwin64\tmp\flink-web-2d9712e2-54cb-428a-a27a-826fa2214dad\flink-web-upload for file uploads.
2019-03-12 07:38:36,830 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint.
2019-03-12 07:38:37,065 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set.
2019-03-12 07:38:37,065 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (deprecated keys: [jobmanager.web.log.path])'.
2019-03-12 07:38:38,034 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at
10.0.0.6:80812019-03-12 07:38:38,034 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint -
http://10.0.0.6:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-03-12 07:38:38,034 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at
http://10.0.0.6:8081.
2019-03-12 07:38:38,096 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2019-03-12 07:38:38,112 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2019-03-12 07:38:38,190 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - ResourceManager akka.tcp://
flink@10.0.0.6:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2019-03-12 07:38:38,190 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager.
2019-03-12 07:38:38,206 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://
flink@10.0.0.6:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2019-03-12 07:38:38,221 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
2019-03-12 07:44:20,564 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [/
10.0.0.7:51057] failed with java.io.IOException: An existing connection was forcibly closed by the remote host
2019-03-12 07:44:20,564 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-worker1:50978] has failed, address is now gated for [50] ms. Reason: [Disassociated]
Interestingly, the worker node (flink-worker1) never seems to connect to the jobmanager since it keeps retrying. But when I force the task manager to close, job manager reports an error at the end saying the association has failed. For some reason, none of the job manager managed to connect even though port 6123 on the job manager is open and listening.
Any suggestion will be appreciated.