|
Hi Gabriele,
I don't think you can compute the exact running median on a stream. This would require to collect all elements of the stream so you would basically need to put the complete stream into the ValueState. Even if the state is backed by RocksDB, the state for a specific key needs to fit on the heap when it is accessed. Moreover, the ValueState would be completely serialized and deserialized for every access which is of course expensive if the whole stream is materialized.
So, computing an exact running median requires a lot of storage and is expensive to compute.
You might be able to implement an approximate median using adaptive histograms. Best, Fabian
|