wikidatasets

Break WikiData dumps into smaller knowledge graphs


Keywords
wikidatasets
License
BSD-3-Clause
Install
pip install wikidatasets==0.2.0

Documentation

WikiDataSets

Documentation Status Updates

Breaking WikiData dumps into smaller knowledge graphs (e.g. graph of human entities).

Data Sets

Data sets are available on this page.

Features

This is a non-exhaustive list of useful functions :

  • wikidatasets.processFunction.get_subclasses : Gets a list of WikiData IDs of entities which are subclasses of the subject.
  • wikidatasets.processFunction.query_wikidata_dump : Goes through a Wikidata dump. It can either collect entities that are instances of test_entities or collect the dictionary of labels. It can also do both.
  • wikidatasets.processFunction.build_dataset : Builds datasets from the pickle files produced by the query_wikidata_dump.
  • wikidatasets.utils.load_data_labels : Loads the edges and attributes files into Pandas dataframes and merges the labels of entities and relations to get.

The example/ folder contains examples of scripts to create datasets (e.g. build_humans.py). Such scripts should be placed in the main directory (along with utils.py, processFunctions.py) and hard-coded paths should be tuned to match your installation.

Citations

If you find this code useful in your research, please consider citing our paper:

@misc{arm2019wikidatasets,
    title={WikiDataSets : Standardized sub-graphs from WikiData},
    author={Armand Boschin},
    year={2019},
    eprint={1906.04536},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.