Question about RocksDB performance tunning

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about RocksDB performance tunning

Peter Huang
Hi,


I have a stateful Flink job with 500k QPS. The job basically counts the message number on a combination key with 10 minutes tumbling window. If I use memory state backend, the job can run without lag but periodically fails due to OOM. If I turn up RocksDB state backend, it will have a high Kafka lag even about memory tunning. The QPS is also growing very fast. I am wondering whether we have good guidance for performance tunning of RocksDB state backend for such kind of large QPS jobs.


Best Regards 
Peter Huang
Reply | Threaded
Open this post in threaded view
|

Re: Question about RocksDB performance tunning

Yun Tang
Hi Peter

This is a general problem and you could refer to RocksDB's tuning guides[1][2], you could also refer to Flink built-in PredefinedOptions.java [3].
Generally speaking, increase write buffer size to reduce write amplification, increase the parallelism of keyed operator to share the pressure to disks if found IO bottleneck. Bloom filter is good to add to reduce the cost of read amplification. Use high performance disk would help much.



Best
Yun Tang

From: Peter Huang <[hidden email]>
Sent: Friday, July 3, 2020 13:31
To: user <[hidden email]>
Subject: Question about RocksDB performance tunning
 
Hi,


I have a stateful Flink job with 500k QPS. The job basically counts the message number on a combination key with 10 minutes tumbling window. If I use memory state backend, the job can run without lag but periodically fails due to OOM. If I turn up RocksDB state backend, it will have a high Kafka lag even about memory tunning. The QPS is also growing very fast. I am wondering whether we have good guidance for performance tunning of RocksDB state backend for such kind of large QPS jobs.


Best Regards 
Peter Huang
Reply | Threaded
Open this post in threaded view
|

Re: Question about RocksDB performance tunning

Peter Huang
Hi Yun,

Thanks for the info. These materials help a lot. 


Best Regards
Peter Huang

On Thu, Jul 2, 2020 at 11:36 PM Yun Tang <[hidden email]> wrote:
Hi Peter

This is a general problem and you could refer to RocksDB's tuning guides[1][2], you could also refer to Flink built-in PredefinedOptions.java [3].
Generally speaking, increase write buffer size to reduce write amplification, increase the parallelism of keyed operator to share the pressure to disks if found IO bottleneck. Bloom filter is good to add to reduce the cost of read amplification. Use high performance disk would help much.



Best
Yun Tang

From: Peter Huang <[hidden email]>
Sent: Friday, July 3, 2020 13:31
To: user <[hidden email]>
Subject: Question about RocksDB performance tunning
 
Hi,


I have a stateful Flink job with 500k QPS. The job basically counts the message number on a combination key with 10 minutes tumbling window. If I use memory state backend, the job can run without lag but periodically fails due to OOM. If I turn up RocksDB state backend, it will have a high Kafka lag even about memory tunning. The QPS is also growing very fast. I am wondering whether we have good guidance for performance tunning of RocksDB state backend for such kind of large QPS jobs.


Best Regards 
Peter Huang