Flink 1.5 + resource elasticity resulting in overloaded workers

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.5 + resource elasticity resulting in overloaded workers

jelmer
Hi, We recently upgraded to flink 1.6 and seem to be suffering from the issue described in this email


Our workers have 8 slots and some workers are fully loaded and as a consequence get to cope with heavy load during peak times. while other workers sit completely idle. and have 0 jobs assigned to its slots

Is there any workaround for this . short of reducing the number of slots on a worker ? 

We need to have double the slots we need available in order to cope with availability zone maintenance . So if we where to reduce the number of slots we'd have to add new nodes that would then mostly sit idle



Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.5 + resource elasticity resulting in overloaded workers

Vishal Santoshi
+1.
 

On Thu, Dec 20, 2018, 12:56 AM jelmer <[hidden email] wrote:
Hi, We recently upgraded to flink 1.6 and seem to be suffering from the issue described in this email


Our workers have 8 slots and some workers are fully loaded and as a consequence get to cope with heavy load during peak times. while other workers sit completely idle. and have 0 jobs assigned to its slots

Is there any workaround for this . short of reducing the number of slots on a worker ? 

We need to have double the slots we need available in order to cope with availability zone maintenance . So if we where to reduce the number of slots we'd have to add new nodes that would then mostly sit idle



Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.5 + resource elasticity resulting in overloaded workers

jelmer
In reply to this post by jelmer
@flink comitters

I get that you don't want to be aware of task managers but would it make sense to change SlotManager (I briefly looked over the code and i think that's the code that is responsible for this)  to it randomly selects slots ? or add an option to make it do this if this is not something you would want to do by default ? 

It's not going to be perfect but at least you don't end always end up in a better spot than you end up now in a standalone setup


On Wed, 19 Dec 2018 at 20:26, jelmer <[hidden email]> wrote:
Hi, We recently upgraded to flink 1.6 and seem to be suffering from the issue described in this email


Our workers have 8 slots and some workers are fully loaded and as a consequence get to cope with heavy load during peak times. while other workers sit completely idle. and have 0 jobs assigned to its slots

Is there any workaround for this . short of reducing the number of slots on a worker ? 

We need to have double the slots we need available in order to cope with availability zone maintenance . So if we where to reduce the number of slots we'd have to add new nodes that would then mostly sit idle