(DEPRECATED) Apache Flink User Mailing List archive.

How to deal with apps with backpressure but with "good" performance

Classic

List

Threaded

2 messages Options

Jason Liu

How to deal with apps with backpressure but with "good" performance

Hi all,

We are running Flink on AWS Kinesis Data Analytics and lately. After the Flink 1.11 upgrades, we have noticed some of our apps have continuous backpressure since the Flink job starts. However, we have been running these apps for a while now and if we decrease the source parallelism to try to reduce the backpressure, we see the app overall throughput drops slightly comparing to when the source parallelism was still high. Just wondering, if it's okay we keep the app configuration as it is (tolerating the backpressure), since it's pretty stable and have good performance.

Thanks,

Jason

JING ZHANG

Re: How to deal with apps with backpressure but with "good" performance

Hi Jason,

If you see a back pressure warning for a task, this means it is producing data faster than the downstream operators can consume.

We should avoid high back pressure in online jobs because it may lead to the following problems:

1. there are potential performance bottlenecks and may cause high latency

2. for aligned checkpoint, may cause checkpoint problems (e.g long end to end duration) because it took more time to do barrier alignment in back pressure status.

Please note, the community introduced unaligned checkpoint [1], which could solve high checkpoint duration due to back pressure.

All In all, it's better to avoid back pressure, please check document [2] to see what to do with back pressure. If you could tolerate the back pressure, please use unaligned checkpoint instead of aligned checkpoint to avoid high checkpoint duration due to back pressure.

[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/checkpoints/#unaligned-checkpoints

[2] https://flink.apache.org/2019/07/23/flink-network-stack-2.html#backpressure

Best,

JING ZHANG

Jason Liu <[hidden email]> 于2021年6月17日周四上午8:40写道：

Hi all,

We are running Flink on AWS Kinesis Data Analytics and lately. After the Flink 1.11 upgrades, we have noticed some of our apps have continuous backpressure since the Flink job starts. However, we have been running these apps for a while now and if we decrease the source parallelism to try to reduce the backpressure, we see the app overall throughput drops slightly comparing to when the source parallelism was still high. Just wondering, if it's okay we keep the app configuration as it is (tolerating the backpressure), since it's pretty stable and have good performance.

Thanks,
Jason