(DEPRECATED) Apache Flink User Mailing List archive.

RocksDB usage for broad slow data

Classic

List

Threaded

1 message

Derek VerLee

RocksDB usage for broad slow data

We have a data which is broad and slow; hundreds of thousands of keys, a small number will get an event every few seconds, some get an event every few days, and the vast majority will get an event in a few times an hour. Let's say then that keeping this data in heap for the last couple days is not a challenge. However, one additional challenge is that we can receive late events or corrective data, going back indefinitely, and while infrequent, we need to be able to handle this gracefully. Lets say that the total data-set grows too large to keep in memory economically.

One approach of course is a "lambda" type, where sufficiently late events are noted to a side channel, perhaps triggering some batch job to be scheduled.

However I'm pondering a simpler solution, I understand that with the RocksDB, the state size can exceed the heap. Would it be a plausible approach in this situation to never purge windows, keeping computation state back to "the beginning", so that an arbitrarily old window (years, potentially), could re-emit a corrected value?

Thanks

_Derek