http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Problem-starting-taskexecutor-daemons-in-3-node-cluster-tp29908.html
I'm trying to set up a 3 node Flink cluster (version 1.9) on the following machines:
Node 1 (Master) : 4 GB (3.8 GB) Core2 Duo 2.80GHz, Ubuntu 16.04 LTS
Node 2 (Slave) : 16 GB, Core i7-3.40GHz, Ubuntu 16.04 LTS
Node 3 (Slave) : 16 GB, Core i7-3,40GHz, Ubuntu 16.04 LTS
I have defined the IP/address of "jobmanager.rpc.address" in conf/flink-conf.yaml in the follwoing format: master@master-node1-hostname
Slaves as conf/slaves: slave@slave-node2-hostname
slave@slave-node3-hostname
master@master-node1-hostname (using master machine for task execution too)
However my problem is when running bin/start-cluster.sh on Master node, it fails to start taskexecutor daemon on both Slave nodes. It only starts both taskexecutor daemon and standalonesession daemon on master@master-node1-hostname (Node 1)
I have tried both passwordless ssh and password ssh on all machines but the result is the same. In the latter case, it does ask for
slave@slave-node2-hostname, slave@slave-node3-hostname passowords but fails to display any message like "starting taskexecutor daemon on xxxx" after that.
I switched my master node to Node 2 and set Node 1 to slave. It was able to start taskexecutor daemons on both Node 2 and Node 3 successfully but did nothing for Node 1.
I'd appreciate if you can advice on what the problem here could be and how I can resolve it.
Best Regards,
Komal