Hello, I'm running a Job on AWS EMR with the TableAPI that does a long series of Joins, GroupBys, and Aggregates and I'd like to know how to best tune parallelism. In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of memory. There's a job we have to run that has ~30 table operators. Given this, how should I calculate what to set the systems parallelism to? I also plan on running a second job on the same system, but just with 6 operators. Will this change the calculation for parallelism at all? Thanks! -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
Hi Rex, as a rule of thumb I recommend configuring your TMs with as many slots as they have cores. So in your case your cluster would have 32 slots. Then depending on the workload of your jobs you should distribute them across both jobs (so that the total adds up to 32). A high number of operators does not necessarily mean that it needs more slots since operators can share the same slot. It mostly depends on the workload of your job. If the job should be too slow, then you would have to increase the cluster resources. Cheers, Till On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <[hidden email]> wrote:
|
Great, thanks! So just to confirm, configure # of task slots to # of core nodes x # of vCPUs? I'm not sure what you mean by "distribute them across both jobs (so that the total adds up to 32)". Is it configurable how many task slots a job can receive, so in this case I'd provide ~30/36 * 32 task slots for one job and ~6/36 * 32 for another job, but even them out to sum to 32 task slots? Thanks On Fri, Nov 6, 2020 at 10:01 AM Till Rohrmann <[hidden email]> wrote:
-- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
Hi Rex, If you have a cluster with 4 nodes and 8 slots each, then you have a total of 32 slots. Now if you have a job A which you start with a parallelism of 20, then you have 12 slots left. Hence, you could make use of these 12 slots by starting a job B with a parallelism 12. Cheers, Till On Fri, Nov 6, 2020 at 7:20 PM Rex Fenley <[hidden email]> wrote:
|
Awesome, thanks! On Sat, Nov 7, 2020 at 6:43 AM Till Rohrmann <[hidden email]> wrote:
-- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US |
Free forum by Nabble | Edit this page |