(DEPRECATED) Apache Flink User Mailing List archive.

High Job BackPressure

Posted by Sayat Satybaldiyev-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/High-Job-BackPressure-tp24903.html

Dear Flink community,

Would anyone give a clue how to debug a job that has a high backpressure in the kafka source? We have a flink job that joins two stream via Process Function and Rocksdb state backend from two kafka topics. The job is significantly lagging behind ~8 hours and produces an incorrect result.

Flink UI gives a hint that Source Functions(recommendation stream and custom source) are backpressure while recommendation-click join is fine.

I've looked into JM and TM logs and there's nothing stage to me. Except "Kafka error sending fetch request" which happens during a checkpoint. Checkpoints happen once in 20min and utilize almost all network interface.

Please find UI screenshots and flink logs attached to this email.

https://drive.google.com/file/d/14h8zwC_49wxt5uNPYtM3LN6WhJ7lyeVS/view?usp=sharing

https://drive.google.com/file/d/1s6I___S7u0pBJyWdnmYaH0e_MwGr3CgY/view?usp=sharing

task_metrics.png (119K) Download Attachment

watermarks.png (64K) Download Attachment

backpressure-source2-kafka.png (41K) Download Attachment

checkpoint_history.png (76K) Download Attachment

back_pressure_reco-stream.png (32K) Download Attachment

backpressure-clik-join.png (29K) Download Attachment

overall-DAG.png (42K) Download Attachment

NET traffic.png (52K) Download Attachment