How to understand slot?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to understand slot?

Zhangrucong

When I read the schedule code in job manager. I have flowing questions:

 

1、  How to decide a job vertex to deploy in a shared slot?  What is the benefit deploy vertexes in a shared slot?

2、  How to decide a task manager has how many slots?

3、  If there are many task managers, when allocate a new slot, how to decide to use which slot in which task manger?

4、  If there have detail documents about schedule?

 

Thank you for any suggestions in advance!

 

 

Reply | Threaded
Open this post in threaded view
|

Re: How to understand slot?

Stephan Ewen
Hi!

There is a little bit of documentation about the scheduling and the slots




For <our other questions, here are some brief answers:

1) Shared slots are very useful for pipelined execution. Shared slots mean that (in the example of MapReduce), one slot can hold one mapper and one reducer. Mappers and reducers run at the same time when using pipelined execution.

2) A good choice for the number of slots is the number of CPU cores in the processor.

3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs).


Greetings,
Stephan



On Tue, Aug 18, 2015 at 9:29 AM, Zhangrucong <[hidden email]> wrote:

When I read the schedule code in job manager. I have flowing questions:

 

1、  How to decide a job vertex to deploy in a shared slot?  What is the benefit deploy vertexes in a shared slot?

2、  How to decide a task manager has how many slots?

3、  If there are many task managers, when allocate a new slot, how to decide to use which slot in which task manger?

4、  If there have detail documents about schedule?

 

Thank you for any suggestions in advance!

 

 


Reply | Threaded
Open this post in threaded view
|

答复: How to understand slot?

Zhangrucong

Hi stephan, Thanks a lot for answering.

 

3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs).

 

If for sources, Flink picks a random taskmanager. In distributed scene, Some taskmangers run full task, some taskmangers run litter task, It is not balance?

 

Thanks!

 

 

发件人: [hidden email] [mailto:[hidden email]] 代表 Stephan Ewen
发送时间: 2015818 16:23
收件人: [hidden email]
主题: Re: How to understand slot

 

Hi!

 

There is a little bit of documentation about the scheduling and the slots

 

 

 

 

For <our other questions, here are some brief answers:

 

1) Shared slots are very useful for pipelined execution. Shared slots mean that (in the example of MapReduce), one slot can hold one mapper and one reducer. Mappers and reducers run at the same time when using pipelined execution.

 

2) A good choice for the number of slots is the number of CPU cores in the processor.

 

3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs).

 

 

Greetings,

Stephan

 

 

 

On Tue, Aug 18, 2015 at 9:29 AM, Zhangrucong <[hidden email]> wrote:

When I read the schedule code in job manager. I have flowing questions:

 

1  How to decide a job vertex to deploy in a shared slot?  What is the benefit deploy vertexes in a shared slot?

2  How to decide a task manager has how many slots?

3  If there are many task managers, when allocate a new slot, how to decide to use which slot in which task manger?

4  If there have detail documents about schedule?

 

Thank you for any suggestions in advance!

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: 答复: How to understand slot?

Fabian Hueske-2
A TM reserves a certain amount of memory for each slot, but CPU and IO can be shared across slots. Hence, there might be some imbalance among TMs, but this imbalance is limited by the concept of slots which gives an upper bound of the number of tasks that a TM can process.

Also random assignment usually distributes tasks uniformly across TMs.

2015-08-18 10:38 GMT+02:00 Zhangrucong <[hidden email]>:

Hi stephan, Thanks a lot for answering.

 

3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs).

 

If for sources, Flink picks a random taskmanager. In distributed scene, Some taskmangers run full task, some taskmangers run litter task, It is not balance?

 

Thanks!

 

 

发件人: [hidden email] [mailto:[hidden email]] 代表 Stephan Ewen
发送时间: 2015818 16:23
收件人: [hidden email]
主题: Re: How to understand slot

 

Hi!

 

There is a little bit of documentation about the scheduling and the slots

 

 

 

 

For <our other questions, here are some brief answers:

 

1) Shared slots are very useful for pipelined execution. Shared slots mean that (in the example of MapReduce), one slot can hold one mapper and one reducer. Mappers and reducers run at the same time when using pipelined execution.

 

2) A good choice for the number of slots is the number of CPU cores in the processor.

 

3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs).

 

 

Greetings,

Stephan

 

 

 

On Tue, Aug 18, 2015 at 9:29 AM, Zhangrucong <[hidden email]> wrote:

When I read the schedule code in job manager. I have flowing questions:

 

1  How to decide a job vertex to deploy in a shared slot?  What is the benefit deploy vertexes in a shared slot?

2  How to decide a task manager has how many slots?

3  If there are many task managers, when allocate a new slot, how to decide to use which slot in which task manger?

4  If there have detail documents about schedule?

 

Thank you for any suggestions in advance!