pyHTMLProofer
Check for website and static HTML pages for link rot.
Features
pyHTMLProofer can be used on
- Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
- Webpages, you can specify a URL/link to be checked.
pyHTMLProofer at the moment does the following:
- Checks for broken internal links in HTML files
- Checks if external links in HTML or website link are valid
- Check for scripts / stylesheets in HTML files
- Check for images in HTML files
You can read more details below in What's Tested? section.
Roadmap
The follower features are under development:
- Check for images and alt-text in HTML files
- Check Favicons
- Check optimal SEO meta tags
- Caching results
- Config file
Installation
Install pyHTMLProofer with pip:
pip install pyhtmlproofer
What's tested?
You can configure pyHTMLProofer to check:
- a file
- a directory or list of directories
- a URL / Link
Links / Hyperlinks
a
, link
elements: PyHTMLProofer checks-
- If the internal links are valid
- If the internal references (
#in-page-links
) are valid - If the external links are valid
Images
img
elements: PyHTMLProofer checks -
- if the internal image references are valid
- if the external image references are valid
Scripts
script
elements: PyHTMLProofer checks -
- If the internal script references are valid
- If the external script references are reachable
Usage
a) To check a file:
import pyHtmlProofer
file = "path/to/file1.html"
pyHtmlProofer.file(file).check()
b) To check a directories:
import pyHtmlProofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths).check()
c) To validate URL(s):
import pyHtmlProofer
links = ["https://example.com", "https://cloudbytes.dev"]
pyHtmlProofer.links(links).check()
Available Config Options
PROOFER_DEFAULTS = {
"assume_extension": ".html",
"directory_index_file": "index.html",
"disable_external": False,
"ignore_files": [],
"ignore_urls": [],
"enforce_https": True,
"extensions": [".html"],
"log_level": "ERROR",
"report_to_file": True,
"report_filename": "proofer_report",
}
You can override the default configuration options by passing a dictionary of options.
import pyHtmlProofer
options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths, , options=options).check()
Credits
The inspiration was by Ruby based HTMLProofer and lack of Python based alternatives. Although, pyHTMLProofer is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining CloudBytes/Dev> website.