BarkingOwl is a scalable web crawler intended to be used to find specific document types such as PDFs.
Not a hard-core hacker? Check out the web front-end tool for barkingowl here
Background and Description
Barking Owl came out of the need presented at a Hacks and Hackers Rochester (#hhroc) meet-up in Syracuse, NY. A journalist expressed his need for a tool that would assist him in looking for key words within PDFs posted to town websites, such as meeting minutes.
I wanted to make the code for this project as reusable as possible as I knew it had several parallels to other work I had been doing and wanted to do in the future. The solution was a architecture that would allow for significant scalability and extensibility.
How to get started
BarkingOwl is on the pypi network, thus it can be installed using pip:
> pip install barkingowl
To use BarkingOwl you will need to install RabbitMQ. Information on how to install RabbitMQ can be found here: http://www.rabbitmq.com/download.html
Check out the wiki!