A Strategy for Capacity Testing

Posted by Geldenhuys, Morgan Karl on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/A-Strategy-for-Capacity-Testing-tp34607.html

Community,

I am interested in knowing what is the recommended way of capacity
planning a particular Flink application with current resource
allocation. Taking a look at the Flink documentation
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning),
extra resources need to be allocated on top of what has already been
assigned for normal operations for when failures occur. The amount of
extra resources will determine how quickly the application can catch-up
to the head of the input stream, e.g. kafka, considering event time
processing.

So, as far as i know the recommended way of testing the maximum capacity
of the system is to slowly increase the ingestion rate to find the point
just before backpressure would kick in.

Would a strategy of starting the job at an earlier timestamp far enough
in the past so that the system is forced to catch-up for a few minutes,
and then take an average measurement of the ingress rate over this time
be a sufficient strategy for determining the maximum number of messages
that can be processed?

Thank you in advance! Have a great day!

Regards,
M.