Community,
I am interested in knowing what is the recommended way of capacity planning a particular Flink application with current resource allocation. Taking a look at the Flink documentation (https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning), extra resources need to be allocated on top of what has already been assigned for normal operations for when failures occur. The amount of extra resources will determine how quickly the application can catch-up to the head of the input stream, e.g. kafka, considering event time processing. So, as far as i know the recommended way of testing the maximum capacity of the system is to slowly increase the ingestion rate to find the point just before backpressure would kick in. Would a strategy of starting the job at an earlier timestamp far enough in the past so that the system is forced to catch-up for a few minutes, and then take an average measurement of the ingress rate over this time be a sufficient strategy for determining the maximum number of messages that can be processed? Thank you in advance! Have a great day! Regards, M. |
Hi Morgan, If I understand correctly, you mean you want to measure the max throughput that your Flink application can deal with given the certain resource setups? I think forcing Flink to catch-up the data should help on that. Please be aware that Flink may need a warming-up time for the performance to get stabilized. Depends on your workload, this could take up to tens of minutes. Please also be careful with aggregations over large windows. The emitting of windows might introduce large processing workloads, fluctuating the measured throughput. Thank you~ Xintong Song On Thu, Apr 23, 2020 at 4:34 PM Morgan Geldenhuys <[hidden email]> wrote: Community, |
Free forum by Nabble | Edit this page |