We have a data which is broad and slow; hundreds of thousands of
keys, a small number will get an event every few seconds, some get
an event every few days, and the vast majority will get an event
in a few times an hour. Let's say then that keeping this data in
heap for the last couple days is not a challenge. However, one
additional challenge is that we can receive late events or
corrective data, going back indefinitely, and while infrequent, we
need to be able to handle this gracefully. Lets say that the
total data-set grows too large to keep in memory economically.
One approach of course is a "lambda" type, where sufficiently
late events are noted to a side channel, perhaps triggering some
batch job to be scheduled.
However I'm pondering a simpler solution, I understand that
with the RocksDB, the state size can exceed the heap. Would it be
a plausible approach in this situation to never purge windows,
keeping computation state back to "the beginning", so that an
arbitrarily old window (years, potentially), could re-emit a
corrected value?
Thanks
_Derek