jeroen/json-dump-data

Wikibase JSON dump data for testing


Keywords
json, dump, wikibase, wikidata
License
GPL-2.0+

Documentation

JsonDumpData

Build Status Download count Latest Stable Version

JsonDumpData holds extracts of Wikibase Repository JSON dumps.

Such extracts are often needed when testing code interacting with dumps and can also be useful to import realistic data into a test wiki. Adding the extracts to the VCS repository that holds the code needing them can unreasonably inflate its size. Loading this component via Composer works around that problem.

For more information on the JSON dump format, see the [Wikidata database download page] (https://www.wikidata.org/wiki/Wikidata:Database_download).

Installation

To add this package as a local, per-project dependency to your project, simply add a dependency on jeroen/json-dump-data to your project's composer.json file. Here is a minimal example of a composer.json file that just defines a dependency on JsonDumpData 1.0:

{
    "require": {
        "jeroen/json-dump-data": "1.0.*"
    }
}

Usage

The dump extracts are stored in the data directory, and have stable paths. However, when using PHP, it is recommended to obtain the paths via the JsonDumpData class.

$dumpData = new JsonDumpData();
$dumpData->getFiveEntitiesDumpPath();

The methods in this class return the full path to the relevant file. These methods will return the path to the most recent copy of the data this library holds:

  • getOneItemDumpPath
  • getFiveEntitiesDumpPath
  • getOneThousandEntitiesDumpPath
  • getEmptyDumpPath

This means that new versions of the library can have these methods return paths to files with different content (though always adhering to the contract of the method). To get a fully stable path to a specific version, or to get one for an older version, you can use the methods with time qualification:

  • getOneItemFrom2015DumpPath
  • getFiveEntitiesFrom2014DumpPath
  • ...

Dumps that are compressed can be accessed via the same methods, but then with their file extension before "DumpPath". Currently bz2 (bzip2) and gz (gzip) are included.

  • getFiveEntitiesBz2DumpPath
  • getOneThousandEntitiesFrom2015GzDumpPath
  • ...

Release notes

Version 1.0.0 (2015-11-11)

  • Added files from 2015-11-09 dump
  • Added bz2 files
  • Added gz files
  • Added new path getters for the files from 2014

Version 0.1.0 (2014-10-22)

  • Initial release