(DEPRECATED) Apache Flink User Mailing List archive.

Help needed to increase throughput of simple flink app

Posted by ashwinkonale on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Help-needed-to-increase-throughput-of-simple-flink-app-tp39289.html

Hey guys,
I am struggling to improve the throughput of my simple flink application. The target topology is this.

read_from_kafka(byte array deserializer) --rescale--> processFunction(confluent avro deserialization) -> split -> 1. data_sink,2.dlq_sink

Kafka traffic is pretty high
Partitions: 128
Traffic: ~500k msg/s, 50Mbps.

Flink is running on k8s writing to hdfs3. I have ~200CPU and 400G memory at hand. I have tried few configurations but I am not able to get the throughput more than 1mil per second. (Which I need for recovering from failures). I have tried increasing parallelism a lot (until 512), But it has very little impact on the throughput. Primary metric I am considering for throughput is kafka-source, numRecordsOut and message backlog. I have already increased default kafka consumer defaults like max.poll.records etc. Here are the few things I tried already.
Try0: Check raw kafka consumer throughput (kafka_source -> discarding_sink)
tm: 20, slots:4, parallelism 80
throughput: 10Mil/s

Try1: Disable chaining to introduce network related lag.
tm: 20, slots:4, parallelism 80
throughput: 1Mil/s
Also tried with increasing floating-buffers to 100, and buffers-per-channel to 64. Increasing parallelism seems to have no effect.
Observation: out/in buffers are always at 100% utilization.

After this I have tried various different things with different network configs, parallelism,jvm sizes etc. But throughput seems to be stuck at 1Mil. Can someone please help me to figure out what key metrics to look for and how can I improve the situation. Happy to provide any details needed.

Flink version: 1.11.2