Substasks - Uneven allocation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Substasks - Uneven allocation

PedroMrChaves
Hello,

I have a job that has one async operational node (i.e. implements
AsyncFunction). This Operational node will spawn multiple threads that
perform heavy tasks (cpu bound).

I have a Flink Standalone cluster deployed on two machines of 32 cores and
128 gb of RAM, each machine has one task manager and one Job Manager. When I
deploy the job, all of the subtasks from the async operational node end up
on the same machine, which causes it to have a much higher cpu load then the
other.

I've researched ways to overcome this issue, but I haven't found a solution
to my problem.
Ideally, the subtasks would be evenly split across both machines.

Can this problem be solved somehow?

Regards,
Pedro Chaves.



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Best Regards,
Pedro Chaves
Reply | Threaded
Open this post in threaded view
|

Re: Substasks - Uneven allocation

Ken Krugler
Hi Pedro,

That’s interesting, and something we’d like to be able to control as well.

I did a little research, and it seems like (with some stunts) there could be a way to achieve this via CoLocationConstraint/CoLocationGroup magic.

Though CoLocationConstraint is for ensuring the different subtasks of different JobVertices are executed on the same Instance (Task Manager), versus ensuring they’re executed on different Task Managers.

The only thing I found on the list was this snippet (from Till), a few years back...

If your requirement is that O_i will be executed in the same slot as P_i, then you have to add the corresponding JobVertices to a CoLocationGroup. At the moment this is not really exposed but you could try to get the JobGraph from the StreamGraph.getJobGraph and then use JobGraph.getVertices to get the JobVertices. Then you have to find out which JobVertices accommodate your operators. Once this is done, you can colocate them via the JobVertex.setStrictlyCoLocatedWith method. This might solve your problem, but I haven’t tested it myself.

Hoping someone with actual knowledge of the task to slot allocation logic can chime in here with a solution :)

— Ken


On Apr 18, 2018, at 9:10 AM, PedroMrChaves <[hidden email]> wrote:

Hello,

I have a job that has one async operational node (i.e. implements
AsyncFunction). This Operational node will spawn multiple threads that
perform heavy tasks (cpu bound).

I have a Flink Standalone cluster deployed on two machines of 32 cores and
128 gb of RAM, each machine has one task manager and one Job Manager. When I
deploy the job, all of the subtasks from the async operational node end up
on the same machine, which causes it to have a much higher cpu load then the
other.

I've researched ways to overcome this issue, but I haven't found a solution
to my problem.
Ideally, the subtasks would be evenly split across both machines.

Can this problem be solved somehow?

Regards,
Pedro Chaves.



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

--------------------------------------------
+1 530-210-6378

Reply | Threaded
Open this post in threaded view
|

Re: Substasks - Uneven allocation

Kien Truong
In reply to this post by PedroMrChaves

Hi Pedro,

You can try to call either

.rebalance() or .shuffle()


before the Async operator.


Shuffle might give a better result if you have fewer tasks than parallelism.

Best regards,
Kien

On 4/18/2018 11:10 PM, PedroMrChaves wrote:
Hello,

I have a job that has one async operational node (i.e. implements
AsyncFunction). This Operational node will spawn multiple threads that
perform heavy tasks (cpu bound). 

I have a Flink Standalone cluster deployed on two machines of 32 cores and
128 gb of RAM, each machine has one task manager and one Job Manager. When I
deploy the job, all of the subtasks from the async operational node end up
on the same machine, which causes it to have a much higher cpu load then the
other. 

I've researched ways to overcome this issue, but I haven't found a solution
to my problem. 
Ideally, the subtasks would be evenly split across both machines. 

Can this problem be solved somehow? 

Regards,
Pedro Chaves. 



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Substasks - Uneven allocation

PedroMrChaves
That is only used to split the load across all of the subtasks, which am
already doing.
It is not related with the allocation.



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Best Regards,
Pedro Chaves
Reply | Threaded
Open this post in threaded view
|

Re: Substasks - Uneven allocation

Till Rohrmann
Hi Pedro,

currently, Flink does not allow you to explicitly control the scheduling strategy at such a fine grained level. The idea behind this is to achieve location transparency and to make the scheduling easier. 

However, there are some tricks you could play depending on the actual job. For example, given that the async operator is the operator with the highest degree of parallelism p, you could set the number of slots per TM to p / #number TMs. That way Flink would use all of the available slots.

If the parallelism of the async operator is currently below p, then it might be feasible to increase it to p and to decrease the number of concurrent async calls per async operator.

Cheers,
Till

On Fri, Apr 20, 2018 at 2:30 PM, PedroMrChaves <[hidden email]> wrote:
That is only used to split the load across all of the subtasks, which am
already doing.
It is not related with the allocation.



-----
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/