I'm running a standalone cluster on Amazon EC2. Leader election is happening according to the logs, and the Flink Dashboard is up and running, accessible remotely. The issue I'm having is that the SocketWordCount example is not working, the local connection is being refused! In the Flink Dashboard, 0 task managers are being reported. And in the jobmanager logs, the last line indicates "leader session null". All other akka URIs in the log file begin "akka.tcp://flink@PUBLIC_IP/...", but the Resourse Manager URI indicated "akka://flink/...".http://pastebin.com/VWJM8XvW http://pastebin.com/ZrWsbcwa master and slave files are populated with public ips as well. |
More information: From the master node, I cannot `telnet localhost 6123` nor `telnet <PUBLIC IP> 6123` while the cluster is apparently running. Connection refused immediately. `netstat -n | grep 6123` is empty. There's no server listening. But the processes are running on all machines.On Thu, Sep 15, 2016 at 12:41 PM, AJ Heller <[hidden email]> wrote:
|
Hi, could you check what happened to your TaskManagers in the logs? There seems to be a problem with the connection of the TMs to the JM. You're right that you don't strictly need HDFS to run a Flink job as long as you don't want to access HDFS data or write to HDFS. `netstat -atn` should list you all tcp sockets currently used. A socket bound to port 6123 should be among them. Cheers, Till On Thu, Sep 15, 2016 at 11:20 PM, AJ Heller <[hidden email]> wrote:
|
Thank you Till. I was in a time crunch, and rebuilt my cluster from the ground up with hadoop installed. All works fine now, `netstat -pn | grep 6123` shows flink's pid. Hadoop may be irrelevant, I can't rule out PEBKAC yet :-). Sorry, when I have time I'll attempt to reproduce the scenario, on the off chance there's a bug in there I can help dig up. Best, |
Free forum by Nabble | Edit this page |