pyHtmlProofer - A tool for validating internal & external links in HTML files / Websites


Keywords
links, static-site-generator
License
AGPL-3.0
Install
pip install pyhtmlproofer==0.7.3a0

Documentation

CI PyPI Version License

pyHTMLProofer

Check for website and static HTML pages for link rot.

Features

pyHTMLProofer can be used on

  1. Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
  2. Webpages, you can specify a URL/link to be checked.

pyHTMLProofer at the moment does the following:

  1. Checks for broken internal links in HTML files
  2. Checks if external links in HTML or website link are valid
  3. Check for scripts / stylesheets in HTML files
  4. Check for images in HTML files

You can read more details below in What's Tested? section.

Roadmap

The follower features are under development:

  1. Check for images and alt-text in HTML files
  2. Check Favicons
  3. Check optimal SEO meta tags
  4. Caching results
  5. Config file

Installation

Install pyHTMLProofer with pip:

pip install pyhtmlproofer

What's tested?

You can configure pyHTMLProofer to check:

  • a file
  • a directory or list of directories
  • a URL / Link

Links / Hyperlinks

a, link elements: PyHTMLProofer checks-

  • If the internal links are valid
  • If the internal references (#in-page-links) are valid
  • If the external links are valid

Images

img elements: PyHTMLProofer checks -

  • if the internal image references are valid
  • if the external image references are valid

Scripts

script elements: PyHTMLProofer checks -

  • If the internal script references are valid
  • If the external script references are reachable

Usage

a) To check a file:

import pyHtmlProofer
file = "path/to/file1.html"
pyHtmlProofer.file(file).check()

b) To check a directories:

import pyHtmlProofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths).check()

c) To validate URL(s):

import pyHtmlProofer
links = ["https://example.com", "https://cloudbytes.dev"]
pyHtmlProofer.links(links).check()

Available Config Options

PROOFER_DEFAULTS = {
    "assume_extension": ".html",
    "directory_index_file": "index.html",
    "disable_external": False,
    "ignore_files": [],
    "ignore_urls": [],
    "enforce_https": True,
    "extensions": [".html"],
    "log_level": "ERROR",
    "report_to_file": True,
    "report_filename": "proofer_report",
}

You can override the default configuration options by passing a dictionary of options.

import pyHtmlProofer

options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]

pyHtmlProofer.directories(directory_paths, , options=options).check()

Credits

The inspiration was by Ruby based HTMLProofer and lack of Python based alternatives. Although, pyHTMLProofer is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining CloudBytes/Dev> website.