Bloom filter in Flink

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bloom filter in Flink

Gennady Gilin

Hi Everyone,

 

Noticed that Flink sources are contain distributed Bloom filter implementation, so wandering is somebody tried to use it in production for large scale items ( ~2.5 billion items in my case ) and can share experience, or even some statistics about errors and memory consumption.

 

Thanks,

Gennady

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Bloom filter in Flink

Fabian Hueske-2
Hi Gennady,

this bloom filter is actually not distributed and only used internally as an optimization to reduce the amount of data spilled by a hash join.
So, it is not meant to be user facing and not integrated in any API.
You could of course use the code, but there might be better implementations for your purpose.

Best, Fabian

2016-12-13 12:34 GMT+01:00 Gennady Gilin <[hidden email]>:

Hi Everyone,

 

Noticed that Flink sources are contain distributed Bloom filter implementation, so wandering is somebody tried to use it in production for large scale items ( ~2.5 billion items in my case ) and can share experience, or even some statistics about errors and memory consumption.

 

Thanks,

Gennady