hypotonic

Fast asynchronous web scraper with minimalist API.


License
MIT
Install
pip install hypotonic==0.0.6

Documentation

Hypotonic

Fast asynchronous web scraper with minimalist API inspired by awesome node-osmosis.

Hypotonic provides SQLAlchemy-like command chaining DSL to define HTML scrapers. Everything is executed asynchronously via asyncio and is ultra-fast thanks to lxml parser. Supports querying by XPath or CSS selectors.

Hypotonic does not natively execute JavaScript on websites and it is recommended to use prerender.

Installing

Hypotonic requires Python 3.6+ and libxml2 C library.

pip install hypotonic

Example

from hypotonic import Hypotonic

data, errors = (
  Hypotonic()
    .get('http://books.toscrape.com/')
    .paginate('.next a::attr(href)', 5)
    .find('.product_pod h3')
    .set('title')
    .follow('a::attr(href)')
    .set({'price': '.price_color',
          'availability': 'p.availability'})
    .data()
)