getsitemap
getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.
This project may be useful if you are building a search crawler or sitemap URL status code validators.
You can read the documentation for this project on Read the Docs.
Installation 💻
To get started, pip install getsitemap:
pip install getsitemap
Quickstart ⚡
get all URLs recursively in all sitemaps
import getsitemap
urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")
print(urls)
get all URLs in a single sitemap
import getsitemap
all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")
print(all_urls)
Code Quality
This library uses tox, pytest, and flake8 to assure code quality.
To run code quality checks, run the following command:
tox
License 👩⚖️
This project is licensed under an MIT License.
Contributing 🛠️
We would love to have your help in improving getsitemap. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!
If you have
Contributors 💻
- capjamesg