bin/start-cluster.sh won't start jobmanager on master machine

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

bin/start-cluster.sh won't start jobmanager on master machine

Yesheng Ma
Hi all,

​​When I execute bin/start-cluster.sh on the master machine, actually the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is exexuted, which does not open the job manager properly.

I think there might be something wrong with the `-l` argument, since when I use the `bin/jobmanager.sh start` command, everything is fine. Kindly point out if I've done any configuration wrong. Thanks!

Best,
Yesheng


Reply | Threaded
Open this post in threaded view
|

Re: bin/start-cluster.sh won't start jobmanager on master machine

Nico Kruber
Hi Yesheng,
`nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
strange since it should (imho) be an absolute path towards flink.

What you could do to diagnose further, is to try to run the ssh command
manually, i.e. figure out what is being executed by calling
bash -x ./bin/start-cluster.sh
and then run the ssh command without "-n" and not in background "&".
Then you should also see the JobManager stdout to diagnose further.

If that does not help yet, please log into the master manually and
execute the "nohup /bin/bash..." command there to see what is going on.

Depending on where the failure was, there may even be logs on the master
machine.


Nico

On 04/03/18 15:52, Yesheng Ma wrote:

> Hi all,
>
> ​​When I execute bin/start-cluster.sh on the master machine, actually
> the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
> exexuted, which does not open the job manager properly.
>
> I think there might be something wrong with the `-l` argument, since
> when I use the `bin/jobmanager.sh start` command, everything is fine.
> Kindly point out if I've done any configuration wrong. Thanks!
>
> Best,
> Yesheng
>
>


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: bin/start-cluster.sh won't start jobmanager on master machine

Yesheng Ma
Hi Nico,

Thanks for your reply. My major concern is actually the `-l` argument.
The command I executed is: `nohup /bin/bash -x -l "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster dell-01.epcc 8091`, with and without the `-l` argument (the script in Flink's bin directory uses the `-l` argument).

1) with the `-l` argument: the log is quite messy, but there are some clue, the last executed command starts a zsh shell:
```
+ . /home/ysma/.bashrc
++ case $- in
++ return
+ PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
+ '[' -f /bin/zsh ']'
+ exec /bin/zsh -l
```
I guess the bash -l arguments detects the user's login shell and then logs in a zsh shell (which I'm currently using) and never back.

2) without the `-l` argument, everything just goes fine.

Therefore I suspect there might be something wrong with the `-l` argument, or something wrong with my bash config?  Any ideas? Thanks!


On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <[hidden email]> wrote:
Hi Yesheng,
`nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
strange since it should (imho) be an absolute path towards flink.

What you could do to diagnose further, is to try to run the ssh command
manually, i.e. figure out what is being executed by calling
bash -x ./bin/start-cluster.sh
and then run the ssh command without "-n" and not in background "&".
Then you should also see the JobManager stdout to diagnose further.

If that does not help yet, please log into the master manually and
execute the "nohup /bin/bash..." command there to see what is going on.

Depending on where the failure was, there may even be logs on the master
machine.


Nico

On 04/03/18 15:52, Yesheng Ma wrote:
> Hi all,
>
> ​​When I execute bin/start-cluster.sh on the master machine, actually
> the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
> exexuted, which does not open the job manager properly.
>
> I think there might be something wrong with the `-l` argument, since
> when I use the `bin/jobmanager.sh start` command, everything is fine.
> Kindly point out if I've done any configuration wrong. Thanks!
>
> Best,
> Yesheng
>
>


Reply | Threaded
Open this post in threaded view
|

Re: bin/start-cluster.sh won't start jobmanager on master machine

Yesheng Ma

On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma <[hidden email]> wrote:
Hi Nico,

Thanks for your reply. My major concern is actually the `-l` argument.
The command I executed is: `nohup /bin/bash -x -l "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster dell-01.epcc 8091`, with and without the `-l` argument (the script in Flink's bin directory uses the `-l` argument).

1) with the `-l` argument: the log is quite messy, but there are some clue, the last executed command starts a zsh shell:
```
+ . /home/ysma/.bashrc
++ case $- in
++ return
+ PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
+ '[' -f /bin/zsh ']'
+ exec /bin/zsh -l
```
I guess the bash -l arguments detects the user's login shell and then logs in a zsh shell (which I'm currently using) and never back.

2) without the `-l` argument, everything just goes fine.

Therefore I suspect there might be something wrong with the `-l` argument, or something wrong with my bash config?  Any ideas? Thanks!


On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <[hidden email]> wrote:
Hi Yesheng,
`nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
strange since it should (imho) be an absolute path towards flink.

What you could do to diagnose further, is to try to run the ssh command
manually, i.e. figure out what is being executed by calling
bash -x ./bin/start-cluster.sh
and then run the ssh command without "-n" and not in background "&".
Then you should also see the JobManager stdout to diagnose further.

If that does not help yet, please log into the master manually and
execute the "nohup /bin/bash..." command there to see what is going on.

Depending on where the failure was, there may even be logs on the master
machine.


Nico

On 04/03/18 15:52, Yesheng Ma wrote:
> Hi all,
>
> ​​When I execute bin/start-cluster.sh on the master machine, actually
> the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
> exexuted, which does not open the job manager properly.
>
> I think there might be something wrong with the `-l` argument, since
> when I use the `bin/jobmanager.sh start` command, everything is fine.
> Kindly point out if I've done any configuration wrong. Thanks!
>
> Best,
> Yesheng
>
>



Reply | Threaded
Open this post in threaded view
|

Re: bin/start-cluster.sh won't start jobmanager on master machine

Yesheng Ma
Oh, I have figured out the problem, which has something to do with my ~/.profile, i cannot remember when i added one line in the ~/.profile, which sources my .zshrc, leading to  the login shell always goes to zsh.

On Wed, Mar 7, 2018 at 2:13 AM, Yesheng Ma <[hidden email]> wrote:

On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma <[hidden email]> wrote:
Hi Nico,

Thanks for your reply. My major concern is actually the `-l` argument.
The command I executed is: `nohup /bin/bash -x -l "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster dell-01.epcc 8091`, with and without the `-l` argument (the script in Flink's bin directory uses the `-l` argument).

1) with the `-l` argument: the log is quite messy, but there are some clue, the last executed command starts a zsh shell:
```
+ . /home/ysma/.bashrc
++ case $- in
++ return
+ PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
+ '[' -f /bin/zsh ']'
+ exec /bin/zsh -l
```
I guess the bash -l arguments detects the user's login shell and then logs in a zsh shell (which I'm currently using) and never back.

2) without the `-l` argument, everything just goes fine.

Therefore I suspect there might be something wrong with the `-l` argument, or something wrong with my bash config?  Any ideas? Thanks!


On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber <[hidden email]> wrote:
Hi Yesheng,
`nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
strange since it should (imho) be an absolute path towards flink.

What you could do to diagnose further, is to try to run the ssh command
manually, i.e. figure out what is being executed by calling
bash -x ./bin/start-cluster.sh
and then run the ssh command without "-n" and not in background "&".
Then you should also see the JobManager stdout to diagnose further.

If that does not help yet, please log into the master manually and
execute the "nohup /bin/bash..." command there to see what is going on.

Depending on where the failure was, there may even be logs on the master
machine.


Nico

On 04/03/18 15:52, Yesheng Ma wrote:
> Hi all,
>
> ​​When I execute bin/start-cluster.sh on the master machine, actually
> the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
> exexuted, which does not open the job manager properly.
>
> I think there might be something wrong with the `-l` argument, since
> when I use the `bin/jobmanager.sh start` command, everything is fine.
> Kindly point out if I've done any configuration wrong. Thanks!
>
> Best,
> Yesheng
>
>