Hello Flink expert, I have a cluster with 10 Task Managers, configured with 6 task slot each, and a pipeline that has 13 tasks/operators with parallelism of 5. But when running the pipeline I observer that only 5 slots are being used, the other 55 slots are available/free. It should use all of my slots, right? since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that I missed in order to leverage all of the available slots for my pipelines? Thanks, Cam |
What you'se seeing is likely operator chaining. This is the default behaviour of grouping sub tasks to avoid transer overhead (from one slot to another). You can disable chaining if you need to. Please refer task and operator chains. - Abhishek On Mon, 12 Aug 2019 at 09:56, Cam Mach <[hidden email]> wrote:
|
Hi Cam, This case is expected due to slot sharing.A slot can be shared by one instance of different tasks. So the used slot is count of your max parallelism of a task. You can specify the shared group with slotSharingGroup(String slotSharingGroup) on operators. Thanks, Zhu Zhu Abhishek Jain <[hidden email]> 于2019年8月12日周一 下午1:23写道:
|
Hi Zhu and Abhishek, Thanks for your response and pointers. It's correct, the count of parallelism will be the number of slot used for a pipeline. And, the number (or count) of the parallelism is also used to generate number of sub-tasks for each operator. In my case, I have parallelism of 60, it generates 60 sub-tasks for each operator. And so it'll be too much for one slot execute at least 60 sub-tasks. I am wondering if there is a way we can set number of generated sub-tasks, different than number of parallelism? Cam Mach On Sun, Aug 11, 2019 at 10:37 PM Zhu Zhu <[hidden email]> wrote:
|
Hi Cam, If you set parallelism to 60, then you would make use of all 60 slots you have and for you case, each slot executes a chained operator contains 13 tasks. It is not the case one slot executes at least 60 sub-tasks. Best, tison. Cam Mach <[hidden email]> 于2019年8月12日周一 下午7:55写道:
|
Hi Cam, Each shared slot can at most host one instance of each different task(JobVertex). So you will have at most 13 tasks in each slot. As shown in https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/runtime.html#task-slots-and-resources. To specify its parallelism individually, you can invoke setParallelism on each operator. Thanks, Zhu Zhu Zili Chen <[hidden email]> 于2019年8月12日周一 下午8:00写道:
|
Free forum by Nabble | Edit this page |