drafttopic

A library for automatic detection of topics of new drafts on Wikipedia based on WikiProjects.


Keywords
artificial-intelligence
License
MIT
Install
pip install drafttopic==0.4.1

Documentation

Draft topic

Predicting topics to new drafts based on Wikiprojects on English Wikipedia.

Setting up

Make sure to have a working python3 environment. Install requirements using:

pip install -r requirements

Install the library using:

python setup.py install

Generating machine-readable WikiProjects data

Use the following utility from root directory to generate machine-readable WikiProjects data:

./utility fetch_wikiprojects --output <output_file_name.json>

Generating mid-level category to WikiProjects mapping

Use the following utility from root directory to generate a mapping of high-level topic categories to list of WikiProjects contained in them:

./utility trim_wikiprojects --wikiprojects wp --output outmid

Labeling a list of page-ids with the wikiprojects and mid-level categories each page belongs to

Use the following utility from root directory to label a list of page-ids with the wikiprojects and the mid-level categories the page belongs to.

./utility fetch_page_wikiprojects --api-host=https://en.wikipedia.org/ --input=wikiproject_page_ids.json --output=enwiki.labeled_wikiprojects.json --mid_level_wp=outmid.json --verbose

In above, the input to the script should be a json containing a list of observations, each observation having a page_id: <page-id> mapping. Additionally also pass the mid-level wikiprojects json for the script to generate wikiprojects to mid-level categories mapping. The script augments the given list with the mentioned fields, writing them to a new file specified by "output"

Generating predictions for a set of page-ids on Wikipedia

For generating topic predictions for a set of revision-ids, download the relevant model and use revscoring's score API to generate predictions. Note that the revision-ids need to be in a file with a format specified by the API. Use the revision ID of the most recent revision for a page to get a good prediction.