Why available task slots are not leveraged for pipeline?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Why available task slots are not leveraged for pipeline?

Cam Mach
Hello Flink expert,

I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only  5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines?

Thanks,
Cam

Reply | Threaded
Open this post in threaded view
|

Re: Why available task slots are not leveraged for pipeline?

Abhishek Jain
What you'se seeing is likely operator chaining. This is the default behaviour of grouping sub tasks to avoid transer overhead (from one slot to another). You can disable chaining if you need to. Please refer task and operator chains.

- Abhishek

On Mon, 12 Aug 2019 at 09:56, Cam Mach <[hidden email]> wrote:
Hello Flink expert,

I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only  5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines?

Thanks,
Cam


Reply | Threaded
Open this post in threaded view
|

Re: Why available task slots are not leveraged for pipeline?

Zhu Zhu
Hi Cam,
This case is expected due to slot sharing.
A slot can be shared by one instance of different tasks. So the used slot is count of your max parallelism of a task.
You can specify the shared group with slotSharingGroup(String slotSharingGroup) on operators.

Thanks,
Zhu Zhu

Abhishek Jain <[hidden email]> 于2019年8月12日周一 下午1:23写道:
What you'se seeing is likely operator chaining. This is the default behaviour of grouping sub tasks to avoid transer overhead (from one slot to another). You can disable chaining if you need to. Please refer task and operator chains.

- Abhishek

On Mon, 12 Aug 2019 at 09:56, Cam Mach <[hidden email]> wrote:
Hello Flink expert,

I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only  5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines?

Thanks,
Cam


Reply | Threaded
Open this post in threaded view
|

Re: Why available task slots are not leveraged for pipeline?

Cam Mach
Hi Zhu and Abhishek,

Thanks for your response and pointers. It's correct, the count of parallelism will be the number of slot used for a pipeline. And, the number (or count) of the parallelism is also used to generate number of sub-tasks for each operator. In my case, I have parallelism of 60, it generates 60 sub-tasks for each operator. And so it'll be too much for one slot execute at least 60 sub-tasks. I am wondering if there is a way we can set number of generated sub-tasks, different than number of parallelism?

Cam Mach
Software Engineer
Tel: 206 972 2768



On Sun, Aug 11, 2019 at 10:37 PM Zhu Zhu <[hidden email]> wrote:
Hi Cam,
This case is expected due to slot sharing.
A slot can be shared by one instance of different tasks. So the used slot is count of your max parallelism of a task.
You can specify the shared group with slotSharingGroup(String slotSharingGroup) on operators.

Thanks,
Zhu Zhu

Abhishek Jain <[hidden email]> 于2019年8月12日周一 下午1:23写道:
What you'se seeing is likely operator chaining. This is the default behaviour of grouping sub tasks to avoid transer overhead (from one slot to another). You can disable chaining if you need to. Please refer task and operator chains.

- Abhishek

On Mon, 12 Aug 2019 at 09:56, Cam Mach <[hidden email]> wrote:
Hello Flink expert,

I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only  5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines?

Thanks,
Cam


Reply | Threaded
Open this post in threaded view
|

Re: Why available task slots are not leveraged for pipeline?

tison
Hi Cam,

If you set parallelism to 60, then you would make use of all 60 slots you have and
for you case, each slot executes a chained operator contains 13 tasks. It is not
the case one slot executes at least 60 sub-tasks.

Best,
tison.


Cam Mach <[hidden email]> 于2019年8月12日周一 下午7:55写道:
Hi Zhu and Abhishek,

Thanks for your response and pointers. It's correct, the count of parallelism will be the number of slot used for a pipeline. And, the number (or count) of the parallelism is also used to generate number of sub-tasks for each operator. In my case, I have parallelism of 60, it generates 60 sub-tasks for each operator. And so it'll be too much for one slot execute at least 60 sub-tasks. I am wondering if there is a way we can set number of generated sub-tasks, different than number of parallelism?

Cam Mach
Software Engineer
Tel: 206 972 2768



On Sun, Aug 11, 2019 at 10:37 PM Zhu Zhu <[hidden email]> wrote:
Hi Cam,
This case is expected due to slot sharing.
A slot can be shared by one instance of different tasks. So the used slot is count of your max parallelism of a task.
You can specify the shared group with slotSharingGroup(String slotSharingGroup) on operators.

Thanks,
Zhu Zhu

Abhishek Jain <[hidden email]> 于2019年8月12日周一 下午1:23写道:
What you'se seeing is likely operator chaining. This is the default behaviour of grouping sub tasks to avoid transer overhead (from one slot to another). You can disable chaining if you need to. Please refer task and operator chains.

- Abhishek

On Mon, 12 Aug 2019 at 09:56, Cam Mach <[hidden email]> wrote:
Hello Flink expert,

I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only  5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines?

Thanks,
Cam


Reply | Threaded
Open this post in threaded view
|

Re: Why available task slots are not leveraged for pipeline?

Zhu Zhu
Hi Cam,

Zili is correct.
Each shared slot can at most host one instance of each different task(JobVertex). So you will have at most 13 tasks in each slot.

To specify its parallelism individually, you can invoke setParallelism on each operator.

Thanks,
Zhu Zhu

Zili Chen <[hidden email]> 于2019年8月12日周一 下午8:00写道:
Hi Cam,

If you set parallelism to 60, then you would make use of all 60 slots you have and
for you case, each slot executes a chained operator contains 13 tasks. It is not
the case one slot executes at least 60 sub-tasks.

Best,
tison.


Cam Mach <[hidden email]> 于2019年8月12日周一 下午7:55写道:
Hi Zhu and Abhishek,

Thanks for your response and pointers. It's correct, the count of parallelism will be the number of slot used for a pipeline. And, the number (or count) of the parallelism is also used to generate number of sub-tasks for each operator. In my case, I have parallelism of 60, it generates 60 sub-tasks for each operator. And so it'll be too much for one slot execute at least 60 sub-tasks. I am wondering if there is a way we can set number of generated sub-tasks, different than number of parallelism?

Cam Mach
Software Engineer
Tel: 206 972 2768



On Sun, Aug 11, 2019 at 10:37 PM Zhu Zhu <[hidden email]> wrote:
Hi Cam,
This case is expected due to slot sharing.
A slot can be shared by one instance of different tasks. So the used slot is count of your max parallelism of a task.
You can specify the shared group with slotSharingGroup(String slotSharingGroup) on operators.

Thanks,
Zhu Zhu

Abhishek Jain <[hidden email]> 于2019年8月12日周一 下午1:23写道:
What you'se seeing is likely operator chaining. This is the default behaviour of grouping sub tasks to avoid transer overhead (from one slot to another). You can disable chaining if you need to. Please refer task and operator chains.

- Abhishek

On Mon, 12 Aug 2019 at 09:56, Cam Mach <[hidden email]> wrote:
Hello Flink expert,

I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only  5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines?

Thanks,
Cam