Hello,
I have a job that has one async operational node (i.e. implements AsyncFunction). This Operational node will spawn multiple threads that perform heavy tasks (cpu bound). I have a Flink Standalone cluster deployed on two machines of 32 cores and 128 gb of RAM, each machine has one task manager and one Job Manager. When I deploy the job, all of the subtasks from the async operational node end up on the same machine, which causes it to have a much higher cpu load then the other. I've researched ways to overcome this issue, but I haven't found a solution to my problem. Ideally, the subtasks would be evenly split across both machines. Can this problem be solved somehow? Regards, Pedro Chaves. ----- Best Regards, Pedro Chaves -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Best Regards,
Pedro Chaves |
Hi Pedro,
That’s interesting, and something we’d like to be able to control as well. I did a little research, and it seems like (with some stunts) there could be a way to achieve this via CoLocationConstraint/CoLocationGroup magic. Though CoLocationConstraint is for ensuring the different subtasks of different JobVertices are executed on the same Instance (Task Manager), versus ensuring they’re executed on different Task Managers. The only thing I found on the list was this snippet (from Till), a few years back...
Hoping someone with actual knowledge of the task to slot allocation logic can chime in here with a solution :) — Ken
|
In reply to this post by PedroMrChaves
Hi Pedro, You can try to call either .rebalance() or
On 4/18/2018 11:10 PM, PedroMrChaves wrote:Hello, I have a job that has one async operational node (i.e. implements AsyncFunction). This Operational node will spawn multiple threads that perform heavy tasks (cpu bound). I have a Flink Standalone cluster deployed on two machines of 32 cores and 128 gb of RAM, each machine has one task manager and one Job Manager. When I deploy the job, all of the subtasks from the async operational node end up on the same machine, which causes it to have a much higher cpu load then the other. I've researched ways to overcome this issue, but I haven't found a solution to my problem. Ideally, the subtasks would be evenly split across both machines. Can this problem be solved somehow? Regards, Pedro Chaves. ----- Best Regards, Pedro Chaves -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
That is only used to split the load across all of the subtasks, which am
already doing. It is not related with the allocation. ----- Best Regards, Pedro Chaves -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Best Regards,
Pedro Chaves |
Hi Pedro, currently, Flink does not allow you to explicitly control the scheduling strategy at such a fine grained level. The idea behind this is to achieve location transparency and to make the scheduling easier. However, there are some tricks you could play depending on the actual job. For example, given that the async operator is the operator with the highest degree of parallelism p, you could set the number of slots per TM to p / #number TMs. That way Flink would use all of the available slots. If the parallelism of the async operator is currently below p, then it might be feasible to increase it to p and to decrease the number of concurrent async calls per async operator. Cheers, Till On Fri, Apr 20, 2018 at 2:30 PM, PedroMrChaves <[hidden email]> wrote: That is only used to split the load across all of the subtasks, which am |
Free forum by Nabble | Edit this page |