dynamic-pybloom Release 3.2.0

Dynamic Pybloom: A Suite of Probabilistic Data Structures

Keywords
data structures, bloom filter, bloom, filter, probabilistic, set
MIT
Install
``` pip install dynamic-pybloom==3.2.0 ```

dynamic-pybloom

`dynamic-pybloom` is a fork of the popular https://travis-ci.org/jaybaird/python-bloomfilter repo module that includes a Bloom Filter data structure, an implementation of the Scalable Bloom Filter and a new implementation of the Dynamic Bloom Filter.

Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set. Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size. Dynamic Bloom Filters allow your bloom filters to grow like a Scalable Bloom Filter, but they preserve the ability to intersect or union with one another.

A filter is "full" when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M is the number of bits and p is the false positive probability. When capacity is reached a new filter is then created exponentially larger than the last with a tighter probability of false positives and a larger number of hash functions.

installation

either clone this repository and run

\$ python setup.py install

or simply

\$ pip install dynamic-pybloom

examples

```>>> from dynamic_pybloom import BloomFilter
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> [f.add(x) for x in range(10)]
[False, False, False, False, False, False, False, False, False, False]
>>> all([(x in f) for x in range(10)])
True
>>> 10 in f
False
>>> 5 in f
True
>>> f = BloomFilter(capacity=1000, error_rate=0.001)
>>> for i in xrange(0, f.capacity):
>>> (1.0 - (len(f) / float(f.capacity))) <= f.error_rate + 2e-18
True

>>> from dynamic_pybloom import ScalableBloomFilter
>>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)
>>> count = 10000
>>> for i in xrange(0, count):
...
>>> (1.0 - (len(sbf) / float(count))) <= sbf.error_rate + 2e-18
True
# len(sbf) may not equal the entire input length. 0.01% error is well
# below the default 0.1% error threshold. As the capacity goes up, the
# error will approach 0.1%.

>>> from dynamic_pybloom import DynamicBloomFilter
>>> dbf = DynamicBloomFilter(base_capacity=100, max_capacity=100000, error_rate=0.001)
>>> len(dbf.filters)
1
>>> for i in xrange(1, 10000):
...
>>> len(dbf.filters)
100
>>> other_dbf = DynamicBloomFilter(base_capacity=100, max_capacity=100000, error_rate=0.001)