start-cluster.sh not working in HA mode

Posted by Marchant, Hayden on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/start-cluster-sh-not-working-in-HA-mode-tp16233.html

I am attempting to run Flink 1.3.2 in HA mode with zookeeper.

When I run the start-cluster.sh, the job manager is not started, even though the task manager is started. When I delved into this, I saw that the  command:

ssh -n $FLINK_SSH_OPTS $master -- "nohup /bin/bash -l \"${FLINK_BIN_DIR}/jobmanager.sh\" start cluster ${master} ${webuiport} &"

is not actually running anything on the host. i.e. I do not see "Starting jobmanager daemon on host ....."

Only when I remove ALL quotes, do I see it working. i.e. if I run:

ssh -n $FLINK_SSH_OPTS $master -- nohup /bin/bash -l ${FLINK_BIN_DIR}/jobmanager.sh start cluster ${master} ${webuiport} &

I see that it manages to run the job manager - I see " Starting jobmanager daemon on host.....".

Did anyone else experience a similar problem? Any elegant workarounds without having to change source code?

Thanks,
Hayden Marchant