(DEPRECATED) Apache Flink User Mailing List archive.

Bloom filter in Flink

Classic

List

Threaded

2 messages Options

Gennady Gilin

Dec 13, 2016; 11:20am

Bloom filter in Flink

Hi Everyone,

Noticed that Flink sources are contain distributed Bloom filter implementation, so wandering is somebody tried to use it in production for large scale items ( ~2.5 billion items in my case ) and can share experience, or even some statistics about errors and memory consumption.

Thanks,

Gennady

Fabian Hueske-2

Dec 13, 2016; 12:27pm

Re: Bloom filter in Flink

Hi Gennady,

this bloom filter is actually not distributed and only used internally as an optimization to reduce the amount of data spilled by a hash join.

So, it is not meant to be user facing and not integrated in any API.

You could of course use the code, but there might be better implementations for your purpose.

Best, Fabian

2016-12-13 12:34 GMT+01:00 Gennady Gilin <[hidden email]>:

Hi Everyone,

Noticed that Flink sources are contain distributed Bloom filter implementation, so wandering is somebody tried to use it in production for large scale items ( ~2.5 billion items in my case ) and can share experience, or even some statistics about errors and memory consumption.

Thanks,

Gennady

... [show rest of quote]