The 2024 Tidelift state of the open source maintainer report! 📊 Read now!

selectorlib
Release 0.8.0

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Homepage PyPI HTML

Keywords: selectorlib, python, scraping, selectors, xpath
License: MIT
Install: pip install selectorlib==0.8.0

Documentation

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Free software: MIT license
Documentation: https://selectorlib.readthedocs.io.

Example

>>> from selectorlib import Extractor
>>> yaml_string = """
    title:
        css: "h1"
        type: Text
    link:
        css: "h2 a"
        type: Link
    """
>>> extractor = Extractor.from_yaml_string(yaml_string)
>>> html = """
    <h1>Title</h1>
    <h2>Usage
        <a class="headerlink" href="http://test">¶</a>
    </h2>
    """
>>> extractor.extract(html)
{'title': 'Title', 'link': 'http://test'}

Dependencies: 3
Dependent packages: 3
Dependent repositories: 0
Total releases: 14
Latest release: Jan 8, 2020
First release: May 21, 2019
Stars: 12
Forks: 4
Watchers: 0
Contributors: 4
Repository size: 334 KB
SourceRank: 9