In tests comparing RocksDb to fs state backend we observe much lower throughput, around 10x slower. While the lowered throughput is expected, what's perplexing is that machine load is also very low with RocksDb, typically falling to < 25% CPU and negligible IO wait (around 0.1%). Our test instances are EC2 c3.xlarge which are 4 virtual CPUs and 7.5G RAM, each running a single TaskManager in YARN, with 6.5G allocated memory per TaskManager. The instances also have 2x40G attached SSDs which we have mapped to `taskmanager.tmp.dir`.
With FS state and 4 slots per TM, we will easily max out with an average load average around 5 or 6, so we actually need throttle down the slots to 3. With RocksDb using the Flink SSD configured options we see a load average at around 1. Also, load (and actual) throughput remain more or less constant no matter how many slots we use. The weak load is spread over all CPUs. Here is a sample top: Cpu0 : 20.5%us, 0.0%sy, 0.0%ni, 79.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 18.5%us, 0.0%sy, 0.0%ni, 81.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 11.6%us, 0.7%sy, 0.0%ni, 87.0%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 12.5%us, 0.3%sy, 0.0%ni, 86.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Our pipeline uses tumbling windows, each with a ValueState keyed to a 3-tuple of one string and two ints.. Each ValueState comprises a small set of tuples around 5-7 fields each. The WindowFunction simply diffs agains the set and updates state if there is a diff. Any ideas as to what the bottleneck is here? Any suggestions welcomed! -Cliff |
Hi Cliff, which Flink version are you using? Are you using Eventtime or processing time windows? I suspect that your disks are "burning" (= your job is IO bound). Can you check with a tool like "iotop" how much disk IO Flink is producing? Then, I would set this number in relation with the theoretical maximum of your SSD's (a good rough estimate is to use dd for that). If you find that your disk bandwidth is saturated by Flink, you could look into tuning the RocksDB settings so that it uses more memory for caching. Regards, Robert On Fri, Dec 2, 2016 at 11:34 PM, Cliff Resnick <[hidden email]> wrote:
|
Hi Robert, I'll keep investigating. If I continue to come up empty then I guess my next steps may be to stage some independent tests directly against RocksDb. -Cliff On Mon, Dec 5, 2016 at 5:52 AM, Robert Metzger <[hidden email]> wrote:
|
Another Flink user using RocksDB with large state on SSDs recently posted this video for oprimizing the performance of Rocks on SSDs: https://www.youtube.com/watch?v=pvUqbIeoPzM That could be relevant for you. For how long did you look at iotop. It could be that the IO access happens in bursts, depending on how data is cached. I'll also add Stefan Richter to the conversation, he has maybe some more ideas what we can do here. On Mon, Dec 5, 2016 at 6:19 PM, Cliff Resnick <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |