Hi,
We have multiple jobs that need to be deployed to a Flink cluster. Parallelism for jobs vary and dependent on the type of work being done and so are the memory requirements. All jobs currently use the same state backend. Since the workloads handled by each job is different, the scaling pattern also varies. We run all our jobs in a single Flink cluster (7 VMs with the same instance configuration) Most of what I have read in the Flink documentation indicates any of the following for setting the task slots 1. As a rule of thumb, a good default number of task slots will be the number of CPU cores. With hyper-threading, each slot then takes 2 or more hardware thread contexts. If you are doing any Blocking IO operations in Flink job, it is suggested to have more number of slots than the core. 2. A Flink cluster needs exactly as many task slots as the highest parallelism used in the job. No need to calculate how many tasks (with varying parallelism) a program contains in total. I did not find documentation for the task slot setting for the scenario I have enumerated. While setting a lower value for the task slots seems to work better for jobs which need to process high amounts of traffic than the other jobs which process lower amounts of traffic, but this will be inefficient if the slots are assigned to jobs which work on lower volumes of traffic. Depending on the workload handled by each Flink job. rt seems that we would need to set as many clusters. 1. Is this the only option available? 2. Are there any guidelines on deciding on the number of task slots in such an environment? Thanks, Sushruth |
Hi,
Do I understand correctly that: 1. The workload varies across the jobs but stays the same for the same job 2. With a small number of slots per TM you are concerned about uneven resource utilization when running low- and high-intensive jobs on the same cluster simultaneously? If so, wouldn't reducing parallelism of low-intensive jobs help? Other options to consider are putting subtasks of high-intensive job into different slot-sharing groups; or breaking operator chains explicitly [1] There are also a number of improvements coming in 1.13 release: [2][3][4]. I'm pulling in Till and Robert who knows this area better. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups [2] https://issues.apache.org/jira/browse/FLINK-21267 [3] https://issues.apache.org/jira/browse/FLINK-10404 [4] https://issues.apache.org/jira/browse/FLINK-14187 Regards, Roman On Fri, Mar 12, 2021 at 5:03 AM Sush Bankapura <[hidden email]> wrote: > > Hi, > > We have multiple jobs that need to be deployed to a Flink cluster. Parallelism for jobs vary and dependent on the type of work being done and so are the memory requirements. All jobs currently use the same state backend. Since the workloads handled by each job is different, the scaling pattern also varies. We run all our jobs in a single Flink cluster (7 VMs with the same instance configuration) > > Most of what I have read in the Flink documentation indicates any of the following for setting the task slots > > 1. As a rule of thumb, a good default number of task slots will be the number of CPU cores. With hyper-threading, each slot then takes 2 or more hardware thread contexts. If you are doing any Blocking IO operations in Flink job, it is suggested to have more number of slots than the core. > > 2. A Flink cluster needs exactly as many task slots as the highest parallelism used in the job. No need to calculate how many tasks (with varying parallelism) a program contains in total. > > I did not find documentation for the task slot setting for the scenario I have enumerated. While setting a lower value for the task slots seems to work better for jobs which need to process high amounts of traffic than the other jobs which process lower amounts of traffic, but this will be inefficient if the slots are assigned to jobs which work on lower volumes of traffic. > > Depending on the workload handled by each Flink job. rt seems that we would need to set as many clusters. > > 1. Is this the only option available? > 2. Are there any guidelines on deciding on the number of task slots in such an environment? > > Thanks, > Sushruth |
Hi Sushruth, if your jobs need significantly different configurations, then I would suggest to think about dedicated clusters per job. That way you can configure the cluster to work best for the respective job. Of course, running multiple clusters instead of a single one comes at the cost of more overhead which you pay for the multiple Flink processes. If you don't want/can't use the per job clusters, then there is not much else you can do to control how the resources of a session cluster are distributed among different jobs other than what Roman has already said. The most effective way is to reduce the parallelism of the jobs which need fewer resources or splitting chains up into units which consume/require the same set of resources to run (CPU, memory). In the future, this problem will most likely be solved by FLIP-53 [1] which allows to specify resource requirements for operators and, thus, the slots a job needs. Cheers, Till On Fri, Mar 12, 2021 at 12:20 PM Roman Khachatryan <[hidden email]> wrote: Hi, |
Hi Roman and Till,
Thank you very much for your responses. With regards on the workload variation across the jobs, let me put it like this 1,. We have some jobs which are CPU intensive (and only operator state being persisted) and there are other jobs which are not so CPU intensive, but have I/O operations. 2. The traffic for each of the above jobs keep increasing over time as and when more data is streamed in Our understanding is, separating the two job types to two different clusters is one of the solutions- 1. Cluster #1 should have as many slots as the number of CPU cores for the CPU intensive job type 2. Cluster #2 should have more number of slots than the number of CPU cores for the IO intensive job types Will study the other options proposed by you folks Regards, Sushruth On 2021/03/12 12:35:13, Till Rohrmann <[hidden email]> wrote: > Hi Sushruth, > > if your jobs need significantly different configurations, then I would > suggest to think about dedicated clusters per job. That way you can > configure the cluster to work best for the respective job. Of course, > running multiple clusters instead of a single one comes at the cost of more > overhead which you pay for the multiple Flink processes. > > If you don't want/can't use the per job clusters, then there is not much > else you can do to control how the resources of a session cluster are > distributed among different jobs other than what Roman has already said. > The most effective way is to reduce the parallelism of the jobs which need > fewer resources or splitting chains up into units which consume/require > the same set of resources to run (CPU, memory). In the future, this problem > will most likely be solved by FLIP-53 [1] which allows to specify resource > requirements for operators and, thus, the slots a job needs. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > > Cheers, > Till > > On Fri, Mar 12, 2021 at 12:20 PM Roman Khachatryan <[hidden email]> wrote: > > > Hi, > > > > Do I understand correctly that: > > 1. The workload varies across the jobs but stays the same for the same job > > 2. With a small number of slots per TM you are concerned about uneven > > resource utilization when running low- and high-intensive jobs on the > > same cluster simultaneously? > > > > If so, wouldn't reducing parallelism of low-intensive jobs help? > > Other options to consider are putting subtasks of high-intensive job > > into different slot-sharing groups; or breaking operator chains > > explicitly [1] > > > > There are also a number of improvements coming in 1.13 release: [2][3][4]. > > > > I'm pulling in Till and Robert who knows this area better. > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups > > [2] https://issues.apache.org/jira/browse/FLINK-21267 > > [3] https://issues.apache.org/jira/browse/FLINK-10404 > > [4] https://issues.apache.org/jira/browse/FLINK-14187 > > > > Regards, > > Roman > > > > On Fri, Mar 12, 2021 at 5:03 AM Sush Bankapura > > <[hidden email]> wrote: > > > > > > Hi, > > > > > > We have multiple jobs that need to be deployed to a Flink cluster. > > Parallelism for jobs vary and dependent on the type of work being done and > > so are the memory requirements. All jobs currently use the same state > > backend. Since the workloads handled by each job is different, the scaling > > pattern also varies. We run all our jobs in a single Flink cluster (7 VMs > > with the same instance configuration) > > > > > > Most of what I have read in the Flink documentation indicates any of > > the following for setting the task slots > > > > > > 1. As a rule of thumb, a good default number of task slots will be the > > number of CPU cores. With hyper-threading, each slot then takes 2 or more > > hardware thread contexts. If you are doing any Blocking IO operations in > > Flink job, it is suggested to have more number of slots than the core. > > > > > > 2. A Flink cluster needs exactly as many task slots as the highest > > parallelism used in the job. No need to calculate how many tasks (with > > varying parallelism) a program contains in total. > > > > > > I did not find documentation for the task slot setting for the scenario > > I have enumerated. While setting a lower value for the task slots seems to > > work better for jobs which need to process high amounts of traffic than the > > other jobs which process lower amounts of traffic, but this will be > > inefficient if the slots are assigned to jobs which work on lower volumes > > of traffic. > > > > > > Depending on the workload handled by each Flink job. rt seems that we > > would need to set as many clusters. > > > > > > 1. Is this the only option available? > > > 2. Are there any guidelines on deciding on the number of task slots in > > such an environment? > > > > > > Thanks, > > > Sushruth > > > |
Free forum by Nabble | Edit this page |