same_username@slave-node2-hostname
same_username@slave-node3-hostname
same_username@master-node1-hostname
slave-node2-hostname
slave-node3-hostname
master-node1-hostname
2019-09-12 15:56:36,625 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --------------------------------------------------------------------------------
2019-09-12 15:56:36,631 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-09-12 15:56:36,647 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum number of open file descriptors is 1048576.
2019-09-12 15:56:36,710 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 150.82.218.218
2019-09-12 15:56:36,711 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2019-09-12 15:56:36,712 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2019-09-12 15:56:36,713 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m
2019-09-12 15:56:36,714 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-09-12 15:56:36,715 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1
2019-09-12 15:56:36,717 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region
2019-09-12 15:56:37,097 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2019-09-12 15:56:37,221 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2019-09-12 15:56:37,305 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2019-09-12 15:56:38,142 INFO org.apache.flink.configuration.Configuration - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2019-09-12 15:56:38,169 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to select the network interface and address to use by connecting to the leading JobManager.
2019-09-12 15:56:38,170 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager will try to connect for 10000 milliseconds before falling back to heuristics
2019-09-12 15:56:38,185 INFO org.apache.flink.runtime.net.ConnectionUtils - Retrieved new target address /150.82.218.218:6123.
2019-09-12 15:56:39,691 INFO org.apache.flink.runtime.net.ConnectionUtils - Trying to connect to address /150.82.218.218:6123
2019-09-12 15:56:39,693 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address 'salman-hpc/127.0.1.1': Invalid argument (connect failed)
2019-09-12 15:56:39,696 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/150.82.219.73': No route to host (Host unreachable)
2019-09-12 15:56:39,698 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:1e10:83f4:a33a:a208%enp5s0f1': Network is unreachable (connect failed)
2019-09-12 15:56:39,748 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/150.82.219.73': connect timed out
2019-09-12 15:56:39,750 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/0:0:0:0:0:0:0:1%lo': Network is unreachable (connect failed)
2019-09-12 15:56:39,751 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/127.0.0.1': Invalid argument (connect failed)
2019-09-12 15:56:39,753 INFO org.apache.flink.runtime.net.ConnectionUtils - Failed to connect from address '/fe80:0:0:0:1e10:83f4:a33a:a208%enp5s0f1': Network is unreachable (connect failed)
"flink-komal-taskexecutor-0-salman-hpc.log" 157L, 29954C
I'm trying to set up a 3 node Flink cluster (version 1.9) on the following machines:
Node 1 (Master) : 4 GB (3.8 GB) Core2 Duo 2.80GHz, Ubuntu 16.04 LTS
Node 2 (Slave) : 16 GB, Core i7-3.40GHz, Ubuntu 16.04 LTS
Node 3 (Slave) : 16 GB, Core i7-3,40GHz, Ubuntu 16.04 LTS
I have followed the instructions on: https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/cluster_setup.html
I have defined the IP/address of "jobmanager.rpc.address" in conf/flink-conf.yaml in the follwoing format: master@master-node1-hostname
Slaves as conf/slaves: slave@slave-node2-hostname
slave@slave-node3-hostname
master@master-node1-hostname (using master machine for task execution too)
However my problem is when running bin/start-cluster.sh on Master node, it fails to start taskexecutor daemon on both Slave nodes. It only starts both taskexecutor daemon and standalonesession daemon on master@master-node1-hostname (Node 1)
I have tried both passwordless ssh and password ssh on all machines but the result is the same. In the latter case, it does ask for
slave@slave-node2-hostname, slave@slave-node3-hostname passowords but fails to display any message like "starting taskexecutor daemon on xxxx" after that.
I switched my master node to Node 2 and set Node 1 to slave. It was able to start taskexecutor daemons on both Node 2 and Node 3 successfully but did nothing for Node 1.
I'd appreciate if you can advice on what the problem here could be and how I can resolve it.
Best Regards,
Komal
Free forum by Nabble | Edit this page |