sitemappy

A package for the generation and analyis of sitemaps


Keywords
python3, webscrape, sitemap, network
License
MIT
Install
pip install sitemappy==0.4

Documentation

sitemappy

A sitemap generator tool that creates a representation of site's network graph for the analysis and visualization of websites.

sitemappy allows you to quickly generate sitemaps and easily transform them to the most popularly used graph formats including weighted and unweighted adjacency lists, adjacency matrices, and node and edge tuples. There's no need to manually reformat graphs to different types as sitemappy can do it for you.

Example Usage:

from sitemappy import SiteNode, SiteMap

siteMap = SiteMap("https://google.com", "optional/path")
siteMap.create_map(total_iterations=3)

print(siteMap.adjacency_list)

Output:

{
    "https://google.com": {
        "https://google.com/preferences?hl=en": 1,
        "https://google.com/advanced_search?hl=en&authuser=0": 1,
        "https://google.com/intl/en/ads/": 1,
        "https://google.com/services/": 1,
        "https://google.com/intl/en/about.html": 1,
        "https://google.com/intl/en/policies/privacy/": 1,
        "https://google.com/intl/en/policies/terms/": 1
    },
    "https://google.com/preferences?hl=en": {
        "https://google.com/webhp?tab=ww": 2,
        "https://google.com/support/websearch?p=ws_cookies_notif&hl=en": 1,
        "https://google.com//support.google.com/websearch?p=ws_settings_safesearch&hl=en": 1,
        "https://google.com/history/optout?hl=en": 1,
        "https://google.com//support.google.com/accounts/answer/61416?hl=en": 1,
        "https://google.com/url?q=https://support.google.com/websearch/%3Fp%3Dws_results_help%26hl%3Den%26fg%3D1&sa=U&ved=0ahUKEwj2numY16_pAhWamXIEHR1QBVUQ8KwCCAI&usg=AOvVaw3NqJ-lkgt0Qo7CjnE2Ayd6": 1,
        "https://google.com/url?q=https://policies.google.com/privacy%3Ffg%3D1&sa=U&ved=0ahUKEwj2numY16_pAhWamXIEHR1QBVUQ8awCCAM&usg=AOvVaw0pArgPRyp-vaqCgaGzMfc1": 1,
        "https://google.com/url?q=https://policies.google.com/terms%3Ffg%3D1&sa=U&ved=0ahUKEwj2numY16_pAhWamXIEHR1QBVUQ8qwCCAQ&usg=AOvVaw1_aRUiN3Fum2y_zLry3YVc": 1
    },
    "https://google.com/advanced_search?hl=en&authuser=0": {
        "https://google.com/preferences?hl=en": 2,
        "https://google.com/?hl=en": 1,
        "https://google.com//support.google.com/websearch?p=adv_safesearch&hl=en": 2,
        "https://google.com//support.google.com/websearch?p=ws_images_usagerights&hl=en": 1,
        "https://google.com//support.google.com/websearch?p=adv_pages_similar&hl=en": 1,
        "https://google.com//support.google.com/websearch?p=adv_pages_visited&hl=en": 1,
        "https://google.com//support.google.com/websearch?p=adv_operators&hl=en": 1,
        "https://google.com/url?q=https://support.google.com/websearch/%3Fp%3Dws_results_help%26hl%3Den%26fg%3D1&sa=U&ved=0ahUKEwjapf-Y16_pAhXDl3IEHTlQArsQ8KwCCAE&usg=AOvVaw0aMA8kBopBDZs29Ql7he0B": 1,
        "https://google.com/url?q=https://policies.google.com/privacy%3Ffg%3D1&sa=U&ved=0ahUKEwjapf-Y16_pAhXDl3IEHTlQArsQ8awCCAI&usg=AOvVaw14grGozHMcjO9vdd86wBY0": 1,
        "https://google.com/url?q=https://policies.google.com/terms%3Ffg%3D1&sa=U&ved=0ahUKEwjapf-Y16_pAhXDl3IEHTlQArsQ8qwCCAM&usg=AOvVaw2RiqOr00-5Bzcl6aF4sS9w": 1
    }
}

Different Graph Types

There are many different ways that graphs can be represented in code, and unfortunately, computers are very picky about which graph representation type you use. Luckily for you, sitemappy's built-in features allow you to use whichever graph representation type you want whenver you want.

from sitemappy import SiteNode, SiteMap

siteMap = SiteMap("https://google.com", "optional/path")
siteMap.create_map(total_iterations=3)

weighted_adj_list = siteMap.get_adjacency_list()
unweighted_adj_list = siteMap.get_unweighted_adjacency_list()
nodes_and_links = siteMap.get_nodes_and_edges()
adj_matrix = siteMap.get_adjacency_matrix()

sitemappy Module Installation

The recommended way to install sitemappy is via pip:

$ pip install sitemappy
Collecting sitemappy
...
Installing collected packages: sitemappy
Successfully installed sitemappy-0.x

Documentation

Documentation can currently be found inside the code in the form of comments.

(Standalone documentation coming soon...)