zimscraperlib

Collection of python tools to re-use common code across scrapers


Keywords
offline, openzim, zim, library, python, webscraping
License
CNRI-Python-GPL-Compatible
Install
pip install zimscraperlib==3.3.2

Documentation

zimscraperlib

Build Status CodeFactor License: GPL v3 PyPI version shields.io PyPI - Python Version codecov

Collection of python code to re-use across python-based scrapers

Usage

  • This library is meant to be installed via PyPI (zimscraperlib).
  • Make sure to reference it using a version code as the API is subject to frequent changes.
  • API should remain the same only within the same minor version.

Example usage:

zimscraperlib>=1.1,<1.2

Dependencies

  • libmagic
  • wget
  • libzim (auto-installed, not available on Windows)
  • Pillow
  • FFmpeg
  • gifsicle (>=1.92)

macOS

brew install libmagic wget libtiff libjpeg webp little-cms2 ffmpeg gifsicle

Linux

sudo apt install libmagic1 wget ffmpeg \
    libtiff5-dev libjpeg8-dev libopenjp2-7-dev zlib1g-dev \
    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \
    libharfbuzz-dev libfribidi-dev libxcb1-dev gifsicle

Alpine

apk add ffmpeg gifsicle libmagic wget libjpeg

Nota: i18n features do not work on Alpine, see #134 ; there is one corresponding test which is failing.

Contribution

This project adheres to openZIM's Contribution Guidelines

pip install hatch
pip install ".[dev]"
pre-commit install
# For tests
invoke coverage

Users

Non-exhaustive list of scrapers using it (check status when updating API):