Flink and sketches

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Flink and sketches

Flavio Pompermaier
Hi to all,
I was looking for an approx_count and freq_item in Flink and I'm not sure which road to follow.
At the moment I found 2 valuable options:
  1. Wait for STREAMLINE to unveil their code of HLL_DISTINCT_COUNT[1]
  2. Use the Yahoo Datasketches lib [2], following the example of Tobias Lindener [3][4] (and maybe release a better and reusable third party lib for Flink)
What do you advice about it? Is there any other ongoing effort on approx statistics?

Best,
Flavio