obscraper: scrape posts from the overcomingbias blog


Keywords
api, bs4, overcomingbias
License
Other
Install
pip install obscraper==0.8.0

Documentation

obscraper

obscraper: scrape posts from the overcomingbias blog

Project Version on PyPI Supported Python Versions Documentation Status Unit Test Coverage Code Style: Black MIT License

obscraper lets you scrape blog posts and associated metadata from the overcomingbias blog.

It's easy to get a single post:

>>> import obscraper
>>> intro_url = 'https://www.overcomingbias.com/2006/11/introduction.html'
>>> post = obscraper.get_post_by_url(intro_url)
>>> post.title
'How To Join'
>>> post.plaintext
'How can we better believe what is true? ...'
>>> post.internal_links
{'http://www.overcomingbias.com/2007/02/moderate_modera.html': 1,
'http://www.overcomingbias.com/2006/12/contributors_be.html': 1}
>>> post.comments
20

Or a full list of post URLs and edit dates:

>>> import obscraper
>>> edit_dates = obscraper.get_edit_dates()
...
>>> len(edit_dates)
4352
>>> {url: str(edit_dates[url]) for url in list(edit_dates)[:5]}
{'2022/01/much-talk-is-sales-patter':
'2022-01-14 20:46:35+00:00',
'2022/01/old-man-rant':
'2022-01-13 15:21:33+00:00',
'2022/01/my-11-bets-at-10-1-odds-on-10m-covid-deaths-by-2022':
'2022-01-12 19:15:10+00:00',
'2022/01/to-innovate-unify-or-fragment':
'2022-01-11 01:03:44+00:00',
'2022/01/on-what-is-advice-useful':
'2022-01-10 18:46:26+00:00'}

Features

  • Get posts by their URLs or edit dates, or get all posts hosted on the overcomingbias site
  • Provides detailed post metadata including post URLs, titles, authors, tags, publish dates, and last edit dates
  • Provides summary of post content including full post text as HTML or plaintext, and a list of hyperlinks to other overcomingbias posts
  • Asynchronous execution and caching for fast downloads
  • Use via import obscraper or the simple command line interface
  • Comprehensively tested
  • Supports python 3.8+

Documentation

Read the full documentation here, including the Installation and Getting Started Guide and the Public API Reference.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request features.

Changelog

See the Changelog for a list of fixes and enhancements at each version.

License

Copyright (c) 2022 Christopher McDonald

Distributed under the terms of the MIT license.

All overcomingbias posts are copyright the original authors.