Task manager count goes the expand then converge process when running flink on YARN

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Task manager count goes the expand then converge process when running flink on YARN

徐涛
Hi experts
I am running flink job on YARN in job cluster mode, the job is divided into 2 tasks, the following are some configs of the job:
parallelism.default => 16
taskmanager.numberOfTaskSlots => 8
-yn => 2

when the program starts, I found that the count of task managers is not set immediately, but first expand then converge, I record the number during the process:
Task Managers         Task Slots         Available Task Slots
1. 14  104                    88
        2. 15                             120                    104
        3. 16                             128                    112
        4. 6                               48                      32
        5. 3                               24                      8
        6. 2                               16                      0

The final state is correct. There are 2 tasks, 32 subtask in total, due to slot sharing, only 16 slots are enough, the number of task slots per TM are 8, so 2 TMs are needed.
I have the following question:
Because I specify yn=2, why does not directly allocate 2 TMs, but goes the expand then converge process?  Why does it apply 16 task managers at most? If it is not a must, how to avoid it?

Thanks a lot!

Best
Henry
Reply | Threaded
Open this post in threaded view
|

Re: Task manager count goes the expand then converge process when running flink on YARN

vino yang
Hi Henry,

The phenomenon you expressed is there, this is a bug, but I can't remember its JIRA number.

Thanks, vino.

徐涛 <[hidden email]> 于2018年10月24日周三 下午11:27写道:
Hi experts
I am running flink job on YARN in job cluster mode, the job is divided into 2 tasks, the following are some configs of the job:
parallelism.default => 16
taskmanager.numberOfTaskSlots => 8
-yn => 2

when the program starts, I found that the count of task managers is not set immediately, but first expand then converge, I record the number during the process:
Task Managers         Task Slots         Available Task Slots
1. 14  104                    88
        2. 15                             120                    104
        3. 16                             128                    112
        4. 6                               48                      32
        5. 3                               24                      8
        6. 2                               16                      0

The final state is correct. There are 2 tasks, 32 subtask in total, due to slot sharing, only 16 slots are enough, the number of task slots per TM are 8, so 2 TMs are needed.
I have the following question:
Because I specify yn=2, why does not directly allocate 2 TMs, but goes the expand then converge process?  Why does it apply 16 task managers at most? If it is not a must, how to avoid it?

Thanks a lot!

Best
Henry
Reply | Threaded
Open this post in threaded view
|

Re: Task manager count goes the expand then converge process when running flink on YARN

Till Rohrmann
Hi Henry,

since version 1.5 you don't need to specify the number of TaskManagers to start, because the system will figure this out. Moreover, in version 1.5.x and 1.6.x it is recommended to set the number of slots per TaskManager to 1 since we did not support multi task slot TaskManagers properly. The problem was that we start for every incoming slot request a separate TaskManager even though there might still be some free slots left. This has been fixed by FLINK-9455 [1]. The fix will be released with the upcoming next major Flink release 1.7.


Cheers,
Till

On Thu, Oct 25, 2018 at 5:58 AM vino yang <[hidden email]> wrote:
Hi Henry,

The phenomenon you expressed is there, this is a bug, but I can't remember its JIRA number.

Thanks, vino.

徐涛 <[hidden email]> 于2018年10月24日周三 下午11:27写道:
Hi experts
I am running flink job on YARN in job cluster mode, the job is divided into 2 tasks, the following are some configs of the job:
parallelism.default => 16
taskmanager.numberOfTaskSlots => 8
-yn => 2

when the program starts, I found that the count of task managers is not set immediately, but first expand then converge, I record the number during the process:
Task Managers         Task Slots         Available Task Slots
1. 14  104                    88
        2. 15                             120                    104
        3. 16                             128                    112
        4. 6                               48                      32
        5. 3                               24                      8
        6. 2                               16                      0

The final state is correct. There are 2 tasks, 32 subtask in total, due to slot sharing, only 16 slots are enough, the number of task slots per TM are 8, so 2 TMs are needed.
I have the following question:
Because I specify yn=2, why does not directly allocate 2 TMs, but goes the expand then converge process?  Why does it apply 16 task managers at most? If it is not a must, how to avoid it?

Thanks a lot!

Best
Henry