RocksDB KeyValue store

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RocksDB KeyValue store

Navneeth Krishnan
Hi All,

I looked at the RocksDB KV store implementation and I found that deserialization has to happen for each key lookup. Given a scenario where the key lookup has to happen for every single message would it still be a good idea to store it in rocksdb store or would in-memory store/cache be more efficient? I know if the data is stored in KV store it will automatically distribute when scale up/scale down happens and its fault tolerant. 

For example, if there are 1M user events and a user config of size 1KB is persisted into rocksdb then for each event would the state have to be deserialized? Wouldn't this create so many garbage?

Also, is there is per machine sort of state store which can be used for all keys that are sent to that task manager?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: RocksDB KeyValue store

taher koitawala-2
I believe Flink serialization is really fast and GC is much better from Flink 1.6 release, along side the state depends on what you do with it. each task manager has its own instance of rocks DB and is responsible for snapshot for his own instance upon checkpointing.

Further more if you used a keyed state only then load is distributed across TMs. Because each TM then caters to a specific keyGroup. If you use a list or a value state then things are different depending to what you need to store and access or update the values. 

On Tue, Jul 30, 2019, 12:09 PM Navneeth Krishnan <[hidden email]> wrote:
Hi All,

I looked at the RocksDB KV store implementation and I found that deserialization has to happen for each key lookup. Given a scenario where the key lookup has to happen for every single message would it still be a good idea to store it in rocksdb store or would in-memory store/cache be more efficient? I know if the data is stored in KV store it will automatically distribute when scale up/scale down happens and its fault tolerant. 

For example, if there are 1M user events and a user config of size 1KB is persisted into rocksdb then for each event would the state have to be deserialized? Wouldn't this create so many garbage?

Also, is there is per machine sort of state store which can be used for all keys that are sent to that task manager?

Thanks