ludoj-scraper
Release 1.10.18

Board games data scraping and processing

Keywords: board games, data, scraper
License: CNRI-Python-GPL-Compatible
Install: pip install ludoj-scraper==1.10.18

Documentation

board-game-scraper

Scraping data about board games from the web. View the data live at Recommend.Games! Install via

pip install board-game-scraper

Sources

Board Game Atlas (bga)
BoardGameGeek (bgg)
DBpedia (dbpedia)
Luding.org (luding)
Spielen.de (spielen)
Wikidata (wikidata)

Run scrapers

Requires Python 3. Make sure Pipenv is installed and create the virtual environment:

python3 -m pip install --upgrade pipenv
pipenv install --dev
pipenv shell

Run a spider like so:

JOBDIR="jobs/${SPIDER}/$(date --utc +'%Y-%m-%dT%H-%M-%S')"
scrapy crawl "${SPIDER}" \
    --output 'feeds/%(name)s/%(time)s/%(class)s.csv' \
    --set "JOBDIR=${JOBDIR}"

where $SPIDER is one of the IDs above.

Run all the spiders with the run_all.sh script. Get a list of the running scrapers' PIDs with the processes.sh script. You can close all the running scrapers via

./processes.sh stop

and resume them later.

Tests

You can run scrapy check to perform contract tests for all spiders, or scrapy check $SPIDER to test one particular spider. If tests fails, there most likely has been some change on the website and the spider needs updating.

Board game datasets

If you are interested in using any of the datasets produced by this scraper, take a look at the BoardGameGeek guild. A subset of the data can also be found on Kaggle.

ludoj-scraper
Release 1.10.18

Release 1.10.18

1.10.18

1.10.17

1.10.16

Documentation

board-game-scraper

Sources

Run scrapers

Tests

Board game datasets

Links

Stats

Development practices

Releases

ludoj-scraper Release 1.10.18

Release 1.10.18 Toggle Dropdown 1.10.18 1.10.17 1.10.16

Documentation

board-game-scraper

Sources

Run scrapers

Tests

Board game datasets

Links

Stats

Development practices

Releases

ludoj-scraper
Release 1.10.18

Release 1.10.18

1.10.18

1.10.17

1.10.16