A Strategy for Capacity Testing

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A Strategy for Capacity Testing

Geldenhuys, Morgan Karl
Community,

I am interested in knowing what is the recommended way of capacity
planning a particular Flink application with current resource
allocation. Taking a look at the Flink documentation
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning),
extra resources need to be allocated on top of what has already been
assigned for normal operations for when failures occur. The amount of
extra resources will determine how quickly the application can catch-up
to the head of the input stream, e.g. kafka, considering event time
processing.

So, as far as i know the recommended way of testing the maximum capacity
of the system is to slowly increase the ingestion rate to find the point
just before backpressure would kick in.

Would a strategy of starting the job at an earlier timestamp far enough
in the past so that the system is forced to catch-up for a few minutes,
and then take an average measurement of the ingress rate over this time
be a sufficient strategy for determining the maximum number of messages
that can be processed?

Thank you in advance! Have a great day!

Regards,
M.
Reply | Threaded
Open this post in threaded view
|

Re: A Strategy for Capacity Testing

Xintong Song
Hi Morgan,

If I understand correctly, you mean you want to measure the max throughput that your Flink application can deal with given the certain resource setups? I think forcing Flink to catch-up the data should help on that.

Please be aware that Flink may need a warming-up time for the performance to get stabilized. Depends on your workload, this could take up to tens of minutes.

Please also be careful with aggregations over large windows. The emitting of windows might introduce large processing workloads, fluctuating the measured throughput.

Thank you~

Xintong Song



On Thu, Apr 23, 2020 at 4:34 PM Morgan Geldenhuys <[hidden email]> wrote:
Community,

I am interested in knowing what is the recommended way of capacity
planning a particular Flink application with current resource
allocation. Taking a look at the Flink documentation
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning),
extra resources need to be allocated on top of what has already been
assigned for normal operations for when failures occur. The amount of
extra resources will determine how quickly the application can catch-up
to the head of the input stream, e.g. kafka, considering event time
processing.

So, as far as i know the recommended way of testing the maximum capacity
of the system is to slowly increase the ingestion rate to find the point
just before backpressure would kick in.

Would a strategy of starting the job at an earlier timestamp far enough
in the past so that the system is forced to catch-up for a few minutes,
and then take an average measurement of the ingress rate over this time
be a sufficient strategy for determining the maximum number of messages
that can be processed?

Thank you in advance! Have a great day!

Regards,
M.