Simple crawling and extraction in Python.

pip install panther



Read about it at!

Get it?  It's a panther.

Panther is a very simple Python scraping library with an emphasis on rapid development, ease of use, and cute panthers. This package is still in a very early development stage but, hey, it works!


pip install panther

How to use

Panther exposes two main methods, pounce() and prowl().

pounce() takes two objects -- a URL (or list of URLs) to check and a CSS/XPath selector (or list of selectors) to extract, e.g.:

# Grab the top 125 subreddits.
url = ""
links = panther.pounce(url, "#yw2 td:nth-child(2) a")
urls = map(lambda a: a.get('href') + "gilded", links)

prowl() takes those same two objects, as well as a third object -- another CSS/XPath selector (or list of selectors). If it finds any a matches in those selectors, it crawls those URLs as well, e.g.:

url = ""
selectors = [".article_title a", ".num"]
next_button = "#readnext a"

for result in panther.prowl(url, selectors, next_button):
    print result.get(selectors[0])[0].text, result.get(selectors[1])[0].text

Check out the examples folder for, well, examples.


  • cssselect
  • lxml
  • requests