wikistats2csv

Wikistats-to-CSV downloads Wikipedia Statistics in CSV format for a given Wikipedia.


Keywords
wikipedia, statistics, wikimedia, csv, cli, human-languages, nlp, python
License
MIT
Install
pip install wikistats2csv==0.1.6

Documentation

Wikistats-to-CSV

alt text

Install:

Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1, rich==12.5.1, numpy==1.23.2, pandas==1.4.3, selenium==3.141.0, and geckodriver-autoinstaller==0.1.0. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip manually.

python3 -m pip install -r requirements.txt

To download Wikistats-to-CSV (wikistats2csv) using pip command , we highly recommend you first upgrade the pip command to the latest version.

python3 -m pip install --upgrade pip
python3 -m pip install wikistats2csv

If you encounter a warning of "WARNING: the script is installed in '/Users/.../.../bin' which is not on path", then you need to add the displayed path "/Users/.../.../bin" to the $PATH variable using this command:

export PATH="/Users/.../.../bin:$PATH"

Usage:

* As CLI:

>> Long Flags:

$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly

              β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β•‘β–Œβ”‚β•‘β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œβ•‘β”‚β–Œβ•‘β”‚β–ˆβ•‘β”‚β–Œβ•‘β–Œβ”‚β–ˆβ•‘β–Œβ”‚β–Œβ•‘β”‚β–Œβ•‘β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œ
              β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β•‘β–Œβ”‚β•‘β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘WIKISTATS-TO-CSVβ•‘β–Œβ”‚β–ˆβ•‘β–Œβ”‚β–Œβ•‘β”‚β–Œβ•‘β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œ
              β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β•‘β–Œβ”‚β•‘β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œβ•‘β”‚β–Œβ•‘β”‚β–ˆβ•‘β”‚β–Œβ•‘β–Œβ”‚β–ˆβ•‘β–Œβ”‚β–Œβ•‘β”‚β–Œβ•‘β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œ

## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                 28             37  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                 51            175  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z           36945305        6518484  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z           37088260        6534151  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z

>> Short Flags:

$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly

              β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β•‘β–Œβ”‚β•‘β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œβ•‘β”‚β–Œβ•‘β”‚β–ˆβ•‘β”‚β–Œβ•‘β–Œβ”‚β–ˆβ•‘β–Œβ”‚β–Œβ•‘β”‚β–Œβ•‘β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œ
              β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β•‘β–Œβ”‚β•‘β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘WIKISTATS-TO-CSVβ•‘β–Œβ”‚β–ˆβ•‘β–Œβ”‚β–Œβ•‘β”‚β–Œβ•‘β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œ
              β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β•‘β–Œβ”‚β•‘β–Œβ”‚β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œβ•‘β”‚β–Œβ•‘β”‚β–ˆβ•‘β”‚β–Œβ•‘β–Œβ”‚β–ˆβ•‘β–Œβ”‚β–Œβ•‘β”‚β–Œβ•‘β•‘β–Œβ•‘β–Œβ–ˆβ•‘β–Œβ•‘β”‚β–Œ

## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0            591  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0            591  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            5508072        1173410  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            5538121        1180401  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

* As Python Package:

>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')

## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0              0  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0              0  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            3896209        1786321  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            3903963        1792329  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

Supported Features:

Content Class/Metrics:

Queries*/Functions** Periods Filters*** Intervals
absolute-bytes-difference*
absolute_bytes_difference**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
edited-pages*
edited_pages**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all,Β 
activity-level-1-to-4-edits,Β 
activity-level-5-to-24-edits,Β 
activity-level-25-to-99-edits,Β 
activity-level-100-or-more-edits,Β  Β 
activity-level-all
daily,
monthly
net-bytes-difference*
net_bytes_difference**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
pages-to-date*
pages_to_date**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
total-media-requests*
total_media_requests**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all,
agent-type-user,
agent-type-spider,
agent-type-all
daily,
monthly
top-media-requests*
top_media_requests**
last-month no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all
daily,
monthly

Β * CLI Queries.Β Β Β Β Β Β Β Β ** Py Functions.Β Β Β Β Β Β Β Β *** More complex filters are coming to the new versions.

Contributing Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
editors* ** all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all,
activity-level-1-to-4-edits,Β 
activity-level-5-to-24-edits,Β 
activity-level-25-to-99-edits,Β 
activity-level-100-or-more-edits,Β Β 
activity-level-all
daily,
monthly
active-editors*
active_editors**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all
daily,
monthly
edits* ** all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
user-edits*
user_edits**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all
daily,
monthly
new-pages*
new_pages**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
new-registered-users*
new_registered_users**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter daily,
monthly
top-editors*
top_editors**
last-month no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
top-edited-pages*
top_edited_pages**
last-month no-filter,Β 
page-type-content,Β 
page-type-non-content,Β 
page-type-all,Β 
editor-type-user,Β 
editor-type-name-bot,Β 
editor-type-anonymous,Β 
editor-type-group-bot,Β 
editor-type-all
daily,
monthly
active-editors-by-country*
active_editors_by_country**
last-month activity-level-5-to-99-edits,
activity-level-100-or-more-edits
daily,
monthly

Β * CLI Queries.Β Β Β Β Β Β Β Β ** Py Functions.Β Β Β Β Β Β Β Β *** More complex filters are coming to the new versions.

Reading Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
total-page-views*
total_page_views**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all,
agent-type-user,
agent-type-spider,
agent-type-automated,
agent-type-all
daily,
monthly
legacy-page-views*
legacy_page_views**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
page-views-by-country*
page_views_by_country**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly
unique-devices*
unique_devices**
all-years,Β 
one-year,Β 
two-years,Β 
three-months,Β 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
top-viewed-articles*
top_viewed_articles**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly

* CLI Queries.Β Β Β Β Β Β Β Β ** Py Functions.Β Β Β Β Β Β Β Β *** More complex filters are coming to the new versions.

Extra Features:

List All Wikipedia Languages with its Codes:

* As CLI:

To return the full list of all Wikipedia's supported languages with their codes, try one of these commands:

$ wikistats2csv -lw
# OR
$ wikistats2csv --list-wikis

* As Python Package:

from wikistats2csv import Helper
Helper.get_Wikis_Codes()