fishfinder
Recursively search a service for all possibilities until the optimal set is found
About
Many sites like North Carolina's Corporate Registry allow you to search for records but limit the results to a set number (i.e. no more than 500). Thus, in order to find the full list of results, you need to search through many combinations of letters and numbers. fishfinder
provides an abstract framework for performing such tasks by generating all possible search queries of a certain length (say two letters), returning the results, and testing the results to determine which queries to add to the set. All searches are performed via gevent
for concurrency.
Install
pip install fishfinder
Usage
fishfinder
is agnostic to the methods you need to perform a search and test its results. To create your own fishfinder, simply inherit from the FishFinder
class and overwrite the methods setup
, search
, and test
. You can see a real-world example in examples/compranet.py
.
from fishfinder import FishFinder
class MyFinder(FishFinder):
def __init__(self):
FishFinder.__init__(self, min_length=2, incl_nums=False, exclude=['a'])
def setup(self):
# This will run before the search / test loop starts
pass
def search(self, query):
# this takes a query and returns something to test
pass
def test(self, result, query):
"""
With our results, test
whether the query was legitimate.
You also have access to the query in this method.
This method should return one of three numbers:
0 = No Results
1 = Pass
2 = Needs More
"""
if __name__ == '__main__':
f = MyFinder()
# set the number of concurrent
# workers in `run`
f.run(num_workers=5)