Bloomin fast Bloomfilters from Peloton
peloton_bloomfilter.SharedMemoryBloomfilter is the easiest to use,
fastest bloomfilter implementation for cPython.
Bloomfilters are probabilistic data structures supporting basic object
membership testing. Objects are added to bloomfilters with the
method and tested for membership with the
in operator. A
False to an
in query is guaranteed to not
have the object; a bloomfilter reporting
True likely has the tested
object subject to a tunable false positive rate.
peloton_bloomfilters implements three bloom-filter classes;
BloomFilter is a plain old bloomfilterfor single threads or gevent apps;
ThreadSafeBloomFilter releases the GIL and uses
__atomic_or_fetch to prevent lost bits during writes and
SharedMemoryBloomfilter supports the creation of bloomfilters that are shared between processes in real time using files and
To create a bloomfilter object you merely import the module and call it with two or three parameters: the file name to hold the shared memory mmap object, the capacity of the bloomfilter and its false positive rate.
>>> from peloton_bloomfilter import * >>> bf = BloomFilter(1000, 0.001) >>> tsbf = ThreadSafeBloomFilter(1000, 0.001) >>> smbf = SharedMemoryBloomfilter("/tmp/filter", 1000, 0.001)
Adding and testing membership against a bloomfilter works exactly like a set, except a bloomfilter cannot be enumerated.
>>> smbf.add(1) False >>> 1 in smbf True >>> 2 in smbf False
add returns False.
SharedMemoryBloomfilter has a
limited capacity; before each add the remaining capacity is tested an
if its insufficient the bloom-filer will be cleared prior to
performing the add and
True is returned
len() reports the number of items stored in the bloomfilter since
created or last cleared.
>>> len(smbf) 1
bloomfilters may be explicitly cleared.
>>> smbf.clear() >>> 1 in smbf False >>> len(smbf) 0
peloton_bloomfilter.SharedMemoryBloomfilter is the fastest cPython
bloomfilter implementation known to its authors. How fast? Here we
benchmark peloton_bloomfilter against pybloomfiltermmap-0.3.14 in
their ability to add and test membership for member and non-member
objects against a 1,000,000 capacity bloomfilter for varying false
Both libraries were compiled with gcc-4.8.5, CFLAGS="-mtune=native
-march=native" on Unbuntu 14.04 running on a Dell XPS 13 with 16Gb Ram
and a dual core Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz and
frequency-set -g performance. Times are in nanoseconds.
1/p peloton pybloommap 10 139 276 100 184 394 10000 431 459 1000000 693 757
1,000,000 membership tests, existing items
1/p peloton pybloommap 10 87 224 100 114 352 10000 307 424 1000000 523 671
1,000,000 membership tests, absent items
1/p peloton pybloommap 10 81 209 100 82 222 10000 102 159 1000000 119 179