Hi Team,
I am trying to increase throughput of my flink stream job streaming from kafka source and sink to s3. Currently it is running fine for small events records. But records with large payloads are running extremely slow like at rate 2 TPS. Could you provide some best practices to tune? Also, can we increase parallel processing, beyond the number of kafka partitions that we have, without causing any overhead ? Regards, Vijay |
Hi, Also, can we increase parallel processing, beyond the number of kafka partitions that we have, without causing any overhead ? Yes, the Kafka sources produce a tiny bit of overhead, but the potential benefit of having downstream operators at a high parallelism might be much bigger. How large is a large payload in your case? Best practices: Try to understand what's causing the performance slowdown: Kafka or S3 ? You can do a test where you read from kafka, and write it into a discarding sink. Likewise, use a datagenerator source, and write into S3. Do the math on your job: What's the theoretical limits of your job: https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Hope this helps, Robert On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <[hidden email]> wrote:
|
Hi Robert, Thanks for information. payloads so far are 400KB (each record). To achieve high parallelism at the downstream operator do I rebalance the kafka stream ? Could you give me an example please. Regards, Vijay On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger <[hidden email]> wrote:
|
Hi, Do you think there can be any issue with Flinks performance, with 400Kb up to 1 MB payload record sizes ? my Spark streaming seems to be doing better. Are there any recommended configurations or increasing parallelism to improve Flink streaming using flink kafka connect? Regards, Vijay On Fri, Aug 14, 2020 at 2:04 PM Vijayendra Yadav <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |