How many task managers can a cluster reasonably handle?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How many task managers can a cluster reasonably handle?

Antonio Verardi
Hello Flink users,

How many task managers one can expect a Flink cluster to be able to reasonably handle?

I want to move a pretty big cluster from a setup on AWS EMR to one based on Kubernetes. I was wondering whether it makes sense to break up the beefy task managers the cluster had in something like 150 task manager containers of a slot each. This is a pattern that a couple different people I met at meetups told me they are using in production, but I don't know if they tried something similar at this scale. Would the jobmanager be able to manage so many task managers in your opinion? 

Cheers,
Antonio
Reply | Threaded
Open this post in threaded view
|

Re: How many task managers can a cluster reasonably handle?

Xintong Song
Hi Antonio,

According to experience in our production, Flink totally can handle 150 TaskManagers per cluster. Actually, we have encountered much larger jobs with thousands that each single job demands thousands of TaskManagers. However, as the job scale increases, it gets harder to achieve good stability. Because there are more tasks, thus higher chance of job failover (or region failover if possible) caused by a single task failure. So if you don't have jobs as large as that scale, I think 150 TaskManagers per cluster would be a good choice.

In case you do encounter a JobManager performance bottleneck, usually it can be solved by increasing the JobManager's resources with a '-jm' argument.

Thank you~

Xintong Song



On Fri, May 24, 2019 at 2:33 AM Antonio Verardi <[hidden email]> wrote:
Hello Flink users,

How many task managers one can expect a Flink cluster to be able to reasonably handle?

I want to move a pretty big cluster from a setup on AWS EMR to one based on Kubernetes. I was wondering whether it makes sense to break up the beefy task managers the cluster had in something like 150 task manager containers of a slot each. This is a pattern that a couple different people I met at meetups told me they are using in production, but I don't know if they tried something similar at this scale. Would the jobmanager be able to manage so many task managers in your opinion? 

Cheers,
Antonio
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: How many task managers can a cluster reasonably handle?

Antonio Verardi
Thanks for the info, Xintong Song!

Cheers,
Antonio


On Fri, May 24, 2019 at 3:38 AM Xintong Song <[hidden email]> wrote:
Hi Antonio,

According to experience in our production, Flink totally can handle 150 TaskManagers per cluster. Actually, we have encountered much larger jobs with thousands that each single job demands thousands of TaskManagers. However, as the job scale increases, it gets harder to achieve good stability. Because there are more tasks, thus higher chance of job failover (or region failover if possible) caused by a single task failure. So if you don't have jobs as large as that scale, I think 150 TaskManagers per cluster would be a good choice.

In case you do encounter a JobManager performance bottleneck, usually it can be solved by increasing the JobManager's resources with a '-jm' argument.

Thank you~

Xintong Song



On Fri, May 24, 2019 at 2:33 AM Antonio Verardi <[hidden email]> wrote:
Hello Flink users,

How many task managers one can expect a Flink cluster to be able to reasonably handle?

I want to move a pretty big cluster from a setup on AWS EMR to one based on Kubernetes. I was wondering whether it makes sense to break up the beefy task managers the cluster had in something like 150 task manager containers of a slot each. This is a pattern that a couple different people I met at meetups told me they are using in production, but I don't know if they tried something similar at this scale. Would the jobmanager be able to manage so many task managers in your opinion? 

Cheers,
Antonio