Re: Substasks - Uneven allocation

Posted by Ken Krugler on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Substasks-Uneven-allocation-tp19612p19613.html

Hi Pedro,

That’s interesting, and something we’d like to be able to control as well.

I did a little research, and it seems like (with some stunts) there could be a way to achieve this via CoLocationConstraint/CoLocationGroup magic.

Though CoLocationConstraint is for ensuring the different subtasks of different JobVertices are executed on the same Instance (Task Manager), versus ensuring they’re executed on different Task Managers.

The only thing I found on the list was this snippet (from Till), a few years back...

If your requirement is that O_i will be executed in the same slot as P_i, then you have to add the corresponding JobVertices to a CoLocationGroup. At the moment this is not really exposed but you could try to get the JobGraph from the StreamGraph.getJobGraph and then use JobGraph.getVertices to get the JobVertices. Then you have to find out which JobVertices accommodate your operators. Once this is done, you can colocate them via the JobVertex.setStrictlyCoLocatedWith method. This might solve your problem, but I haven’t tested it myself.

Hoping someone with actual knowledge of the task to slot allocation logic can chime in here with a solution :)

— Ken


On Apr 18, 2018, at 9:10 AM, PedroMrChaves <[hidden email]> wrote:

Hello,

I have a job that has one async operational node (i.e. implements
AsyncFunction). This Operational node will spawn multiple threads that
perform heavy tasks (cpu bound).

I have a Flink Standalone cluster deployed on two machines of 32 cores and
128 gb of RAM, each machine has one task manager and one Job Manager. When I
deploy the job, all of the subtasks from the async operational node end up
on the same machine, which causes it to have a much higher cpu load then the
other.

I've researched ways to overcome this issue, but I haven't found a solution
to my problem.
Ideally, the subtasks would be evenly split across both machines.

Can this problem be solved somehow?

Regards,
Pedro Chaves.



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

--------------------------------------------
+1 530-210-6378