(DEPRECATED) Apache Flink User Mailing List archive.

Standalone cluster - taskmanager settings ignored

Classic

List

Threaded

6 messages Options

Kaepke, Marc

Standalone cluster - taskmanager settings ignored

Hi,

I have a cluster of 4 dedicated machines (no VMs). My previous config was: 1 master and 3 slaves. Each machine provides a task- or jobmanager.

Now I want to reduce my cluster and have 1 master and 3 slaves, but one machine provides a jobmanager and one task manager in parallel. I changed all conf/slaves files. While I start my cluster everything seems well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two seconds later I see 4 taskmanger and one JM. I also can run a job with 32 slots (4 TM * 8 slots) without any errors.

Why does my cluster has 4 task manager?! All slaves files are cleaned and contains 3 inputs

Thanks!

Marc

Kaepke, Marc

Re: Standalone cluster - taskmanager settings ignored

I start my cluster with:

bigdata@master:/usr/lib/flink-1.3.2$ ./bin/start-cluster.sh 

Starting cluster.

Starting jobmanager daemon on host master.

Starting taskmanager daemon on host master.

Starting taskmanager daemon on host slave1.

Starting taskmanager daemon on host slave3.

And if I stop it:

bigdata@master:/usr/lib/flink-1.3.2$ ./bin/stop-cluster.sh 

Stopping taskmanager daemon (pid: 27050) on host master.

Stopping taskmanager daemon (pid: 2091) on host slave1.

Stopping taskmanager daemon (pid: 12684) on host slave3.

Stopping jobmanager daemon (pid: 26636) on host master.

My previous cluster included additionally slave5.

My current cluster has not slave5. But the WebUI shows 4 TM -> master, slave1, slave3 and slave5

Am 11.08.2017 um 17:25 schrieb Kaepke, Marc <[hidden email]>:

Hi,

I have a cluster of 4 dedicated machines (no VMs). My previous config was: 1 master and 3 slaves. Each machine provides a task- or jobmanager.

Now I want to reduce my cluster and have 1 master and 3 slaves, but one machine provides a jobmanager and one task manager in parallel. I changed all conf/slaves files. While I start my cluster everything seems well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two seconds later I see 4 taskmanger and one JM. I also can run a job with 32 slots (4 TM * 8 slots) without any errors.

Why does my cluster has 4 task manager?! All slaves files are cleaned and contains 3 inputs

Thanks!

Marc

Greg Hogan

Re: Standalone cluster - taskmanager settings ignored

In reply to this post by Kaepke, Marc

Hi Marc,

By chance did you edit the slaves file before shutting down the cluster? If so, then the removed worker would not be stopped and would reconnect to the restarted JobManager.

Greg

> On Aug 11, 2017, at 11:25 AM, Kaepke, Marc <[hidden email]> wrote:
>
> Hi,
>
> I have a cluster of 4 dedicated machines (no VMs). My previous config was: 1 master and 3 slaves. Each machine provides a task- or jobmanager.
>
> Now I want to reduce my cluster and have 1 master and 3 slaves, but one machine provides a jobmanager and one task manager in parallel. I changed all conf/slaves files. While I start my cluster everything seems well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two seconds later I see 4 taskmanger and one JM. I also can run a job with 32 slots (4 TM * 8 slots) without any errors.
>
> Why does my cluster has 4 task manager?! All slaves files are cleaned and contains 3 inputs
>
>
> Thanks!
>
> Marc

Kaepke, Marc

Re: Standalone cluster - taskmanager settings ignored

Hi Greg,

I guess I restarted the cluster too fast. Combined with a high cpu inside the cluster.
I tested it again few minutes ago and there was no issue! With „$ jps“ I checked if there any Java process -> there wasn’t

But if the master don’t know slave5, how can slave5 reconnect to the JobManager? That mean the JobManager will „adopt a child“.

Marc

> Am 11.08.2017 um 20:27 schrieb Greg Hogan <[hidden email]>:
>
> Hi Marc,
>
> By chance did you edit the slaves file before shutting down the cluster? If so, then the removed worker would not be stopped and would reconnect to the restarted JobManager.
>
> Greg
>
>
>> On Aug 11, 2017, at 11:25 AM, Kaepke, Marc <[hidden email]> wrote:
>>
>> Hi,
>>
>> I have a cluster of 4 dedicated machines (no VMs). My previous config was: 1 master and 3 slaves. Each machine provides a task- or jobmanager.
>>
>> Now I want to reduce my cluster and have 1 master and 3 slaves, but one machine provides a jobmanager and one task manager in parallel. I changed all conf/slaves files. While I start my cluster everything seems well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two seconds later I see 4 taskmanger and one JM. I also can run a job with 32 slots (4 TM * 8 slots) without any errors.
>>
>> Why does my cluster has 4 task manager?! All slaves files are cleaned and contains 3 inputs
>>
>>
>> Thanks!
>>
>> Marc

Nico Kruber

Re: Standalone cluster - taskmanager settings ignored

Hi Marc,
the master, i.e. JobManager, does not need to know which clients, i.e.
TaskManager, are supposed to connect to it. Indeed, only the task managers
need to know where to connect to and they will try to establish that
connection and re-connect when losing it.

Nico

On Friday, 11 August 2017 22:24:29 CEST Kaepke, Marc wrote:
> Hi Greg,
>
> I guess I restarted the cluster too fast. Combined with a high cpu inside
> the cluster.
I tested it again few minutes ago and there was no issue!
> With „$ jps“ I checked if there any Java process -> there wasn’t
> But if the master don’t know slave5, how can slave5 reconnect to the
> JobManager? That mean the JobManager will „adopt a child“.

> Marc
>
>
> > Am 11.08.2017 um 20:27 schrieb Greg Hogan <[hidden email]>:
> >
> > Hi Marc,
> >
> > By chance did you edit the slaves file before shutting down the cluster?
> > If so, then the removed worker would not be stopped and would reconnect
> > to the restarted JobManager.

> > Greg
> >
> >
> >
> >> On Aug 11, 2017, at 11:25 AM, Kaepke, Marc <[hidden email]>
> >> wrote:

> >> Hi,
> >>
> >> I have a cluster of 4 dedicated machines (no VMs). My previous config
> >> was: 1 master and 3 slaves. Each machine provides a task- or
> >> jobmanager.

> >> Now I want to reduce my cluster and have 1 master and 3 slaves, but one
> >> machine provides a jobmanager and one task manager in parallel. I
> >> changed all conf/slaves files. While I start my cluster everything seems
> >> well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two
> >> seconds later I see 4 taskmanger and one JM. I also can run a job with
> >> 32 slots (4 TM * 8 slots) without any errors.

> >> Why does my cluster has 4 task manager?! All slaves files are cleaned and
> >> contains 3 inputs

> >>
> >> Thanks!
> >>
> >> Marc
>
>

signature.asc (201 bytes) Download Attachment

Stephan Ewen

Re: Standalone cluster - taskmanager settings ignored

The scripts and the masters/slaves files are only relevant to the scripts which SSH to the machines to start/stop the processes. They have not really an impact on how the processes find each other.

Calling them repeatedly and editing them can start additional processes, or not stop all processes. In that case, you can try and repeatedly call stop-cluster.sh to stop remaining processes, or SSH to the nodes and kill the processes manually.

Also: The files are only relevant on the machine where you execute the shell scripts. If you edit them on other machines, it has no impact.

On Mon, Aug 14, 2017 at 11:46 AM, Nico Kruber <[hidden email]> wrote:

Hi Marc,
the master, i.e. JobManager, does not need to know which clients, i.e.
TaskManager, are supposed to connect to it. Indeed, only the task managers
need to know where to connect to and they will try to establish that
connection and re-connect when losing it.

Nico

On Friday, 11 August 2017 22:24:29 CEST Kaepke, Marc wrote:
> Hi Greg,
>
> I guess I restarted the cluster too fast. Combined with a high cpu inside
> the cluster.
I tested it again few minutes ago and there was no issue!
> With „$ jps“ I checked if there any Java process -> there wasn’t
> But if the master don’t know slave5, how can slave5 reconnect to the
> JobManager? That mean the JobManager will „adopt a child“.

> Marc
>
>
> > Am 11.08.2017 um 20:27 schrieb Greg Hogan <[hidden email]>:
> >
> > Hi Marc,
> >
> > By chance did you edit the slaves file before shutting down the cluster?
> > If so, then the removed worker would not be stopped and would reconnect
> > to the restarted JobManager.

> > Greg
> >
> >
> >
> >> On Aug 11, 2017, at 11:25 AM, Kaepke, Marc <[hidden email]>
> >> wrote:

> >> Hi,
> >>
> >> I have a cluster of 4 dedicated machines (no VMs). My previous config
> >> was: 1 master and 3 slaves. Each machine provides a task- or
> >> jobmanager.

> >> Now I want to reduce my cluster and have 1 master and 3 slaves, but one
> >> machine provides a jobmanager and one task manager in parallel. I
> >> changed all conf/slaves files. While I start my cluster everything seems
> >> well for 2 seconds -> one JM and 3 TM with each 8 cores/slots. Two
> >> seconds later I see 4 taskmanger and one JM. I also can run a job with
> >> 32 slots (4 TM * 8 slots) without any errors.

> >> Why does my cluster has 4 task manager?! All slaves files are cleaned and
> >> contains 3 inputs

> >>
> >> Thanks!
> >>
> >> Marc
>
>