Hi Jason,
If you see a back pressure warning for a task, this means it is producing data faster than the downstream operators can consume.
We should avoid high back pressure in online jobs because it may lead to the following problems:
1. there are potential performance bottlenecks and may cause high latency
2. for aligned checkpoint, may cause checkpoint problems (e.g long end to end duration) because it took more time to do barrier alignment in back pressure status.
Please note, the community introduced unaligned checkpoint [1], which could solve high checkpoint duration due to back pressure.
All In all, it's better to avoid back pressure, please check document [2] to see what to do with back pressure. If you could tolerate the back pressure, please use unaligned checkpoint instead of aligned checkpoint to avoid high checkpoint duration due to back pressure.
Best,
JING ZHANG
Hi all,
We are running Flink on AWS Kinesis Data Analytics and lately. After the Flink 1.11 upgrades, we have noticed some of our apps have continuous backpressure since the Flink job starts. However, we have been running these apps for a while now and if we decrease the source parallelism to try to reduce the backpressure, we see the app overall throughput drops slightly comparing to when the source parallelism was still high. Just wondering, if it's okay we keep the app configuration as it is (tolerating the backpressure), since it's pretty stable and have good performance.
Thanks,
Jason