getsitemap

Retrieve all URLs from a sitemap.


Keywords
crawling, python, sitemap
License
MIT
Install
pip install getsitemap==0.1.5

Documentation

getsitemap

Documentation Status https://img.shields.io/pypi/dm/getsitemap https://img.shields.io/pypi/l/getsitemap https://img.shields.io/pypi/pyversions/getsitemap

getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.

This project may be useful if you are building a search crawler or sitemap URL status code validators.

You can read the documentation for this project on Read the Docs.

Installation 💻

To get started, pip install getsitemap:

pip install getsitemap

Quickstart ⚡

get all URLs recursively in all sitemaps

import getsitemap

urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")

print(urls)

get all URLs in a single sitemap

import getsitemap

all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")

print(all_urls)

Code Quality

This library uses tox, pytest, and flake8 to assure code quality.

To run code quality checks, run the following command:

tox

License 👩‍⚖️

This project is licensed under an MIT License.

Contributing 🛠️

We would love to have your help in improving getsitemap. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!

If you have

Contributors 💻

  • capjamesg