Read about it at http://python-panther.org!
Panther is a very simple Python scraping library with an emphasis on rapid development, ease of use, and cute panthers. This package is still in a very early development stage but, hey, it works!
pip install panther
How to use
Panther exposes two main methods,
pounce() takes two objects -- a URL (or list of URLs) to check and a CSS/XPath selector (or list of selectors) to extract, e.g.:
# Grab the top 125 subreddits. url = "http://www.redditlist.com/" links = panther.pounce(url, "#yw2 td:nth-child(2) a") urls = map(lambda a: a.get('href') + "gilded", links)
prowl() takes those same two objects, as well as a third object -- another CSS/XPath selector (or list of selectors). If it finds any
a matches in those selectors, it crawls those URLs as well, e.g.:
url = "http://dcurt.is/the-fight" selectors = [".article_title a", ".num"] next_button = "#readnext a" for result in panther.prowl(url, selectors, next_button): print result.get(selectors).text, result.get(selectors).text
Check out the examples folder for, well, examples.