ultimate-sitemap-parser

Ultimate Sitemap Parser


Keywords
sitemap, sitemap-xml, parser, python, python-3, python3, robots-txt, xml-sitemap, xml-sitemap-parser
License
GPL-3.0+
Install
pip install ultimate-sitemap-parser==0.5

Documentation

Build Status Documentation Status Coverage Status PyPI package

Website sitemap parser for Python 3.5+.

Features

Installation

pip install ultimate_sitemap_parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses.

If you'd like to just list all the pages found in all of the sitemaps within the website, consider using all_pages() method:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

all_pages() method will return an iterator yielding SitemapPage objects; see a reference of SitemapPage.