moneypenny

A library for normalizing URL lists and creating Google disavow files.


Keywords
urls, disavow
License
Other
Install
pip install moneypenny==0.0.3

Documentation

Introduction

Moneypenny is a library for normalising and handling lists of URLs. It was originally built for the purposes of cleaning and generating disavow files for use with Google.

For example, you may have a file containing a list of URLs or a mix of URLs and 'domain:' entries (i.e. a disavow file), but having been aggregated from various sources you may want to remove duplicates and superfluous entries:

First convert it to a string and parse out the URL and 'domain:' entries using:

import_from_file('<your_filename>.txt')

Then call:

normalize_and_dedupe(<import_from_file_output>) 

On the 'urls' or 'domains' list output as you see fit.

Moneypenny currently handles the creation/modification of a disavow file (including maintaining comments in their original place) and the testing of a disavow file against a separate list of URLs, showing which of these would be disavowed or not, were the disavow file to be applied.

Using Moneypenny

Simply install with pip:

pip install moneypenny

To create / modify an existing disavow file,

First call:

extract_file_contents('<your_filename>.txt')

To convert your file to a string, then pass that to:

disavow_file_to_dict(<extract_file_contents_output>)

With an optional argument for domain_limit, in case you want to disavow all links originating from a certain domain that exceeds your limit.

For example, ‘www.example.com/a/spam’, ‘www.example.com/#spam’ and ‘www.example.com/a/c/?spam’ can be replaced with a single 'domain:www.example.com' entry.

The output gives some summary statistics to do with the number of links/domains entered/disavowed along with 'domain_entries' - which contain the new domains from applying a domain limit and the domains from the original disavow file, and 'link_entries' - the individual links to be disavowed.

To modify your existing file, pass your original file to extract_file_contents(), and use this as the first parameter to:

combine_with_original_disavow('<your_filename>.txt', 
<disavow_file_to_dict_output>)

With the dictionary output of disavow_file_to_dict() as the second parameter. This function will maintain the order (and comments) of your original disavow file.

For testing an existing disavow file against a file containing a list of URLs, simply call:

apply_disavow_files(<your_disavow_filename>.txt,
	<your_urls_to_test_filename>.txt)

With your disavow file as the first parameter, and your URLs file to test as the second. The output is a dictionary, the most relevant keys of which are 'disavowed' and 'non_disavowed'; the rest are statistics summarising the input files and output files.

To Do

  • Port in functionality to parse files from various sources (Majestic, Kerboo, LinkResearchTools) from our older code.

Why Moneypenny?

Moneypenny disavows secret agents, we are disavowing links … geddit?

Contributing

See CONTRIBUTING file.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License

See LICENSE file.