wayback-news-search

Wayback Machine news archive search api client


License
Apache-2.0
Install
pip install wayback-news-search==1.2.1

Documentation

Wayback Machine News Archive Client

🚧 under construction 🚧

A simple client library to access the Wayback Machine news archive search.

Installation

pip install wayback-news-search

Basic Usage

Counting matching stories:

from waybacknews.searchapi import SearchApiClient
import datetime as dt

api = SearchApiClient("mediacloud")
api.count("coronavirus", dt.datetime(2022, 3, 1), dt.datetime(2022, 4, 1))

Paging over all matching results:

from waybacknews.searchapi import SearchApiClient
import datetime as dt

api = SearchApiClient("mediacloud")
for page in api.all_articles("coronavirus", dt.datetime(2022, 3, 1), dt.datetime(2022, 4, 1)):
    do_something(page)

Dev Installation

Install the dependencies for dev: pip install -e .[dev]

Distribution

  1. Run pytest to make sure all the test pass
  2. Update the version number in waybacknews/__init__.py
  3. Make a brief note in the version history section below about the changes
  4. Commit the changes
  5. Tag the commit with a semantic version number - 'v*..'
  6. Push to repo to GitHub
  7. Run python setup.py sdist to create an installation package
  8. Run twine upload --repository-url https://test.pypi.org/legacy/ dist/* to upload it to PyPI's test platform
  9. Run twine upload dist/* to upload it to PyPI

Version History

  • v1.2.0 - add support for new expanded results, and more integration testing
  • v1.1.0 - add new paged_articles method to allow paging over all results
  • v1.0.3 - add 30 sec timeout, remove extra params mcproviders library might be adding
  • v1.0.2 - fix to article endpoint
  • v1.0.1 - automatically escape '/' in query strings, test case for url field search
  • v1.0.0 - update to public API endpoint
  • v0.1.5 - simpler return for top terms
  • v0.1.4 - better error handling
  • v0.1.3 - allow overriding base api URL
  • v0.1.2 - fix article endpoint, test case for fetching content (snippet) via article_url property
  • v0.1.1 - more consistent method names
  • v0.1.0 - initial test-only release