eprinttools

Command line tools, Golang package and Python module for working with the EPrints 3.x REST API


Keywords
GitHub, metadata, repository, EPrints, CrossRef, DataCite, software
License
Other
Install
pip install eprinttools==0.1.7

Documentation

eprinttools

This is a collection of command line tools and a web service written in Go for working with EPrints 3.3.x EPrint XML, the EPrint REST API and directly with the EPrints MySQL repository database(s). It is used by Caltech Library to render our https://feeds.library.caltech.edu website as well as for migrating content into a new repository system. Some of the command line tools maybe of more generatl interest while others are specific to Caltech Library's needs. Much of the test code presumes access to our repositories so is specific to our needs.

Go base code

The programs:

  • eputil is a command line utility for interacting (e.g. harvesting) JSON and XML from EPrints' REST API
    • minimal configuration (because it does so much less!)
  • epfmt is a command line utility to pretty print EPrints XML and convert to/from JSON including a simplified JSON inspired by DataCite and Invenion 3
  • doi2eprintxml is a command line program for turning metadata harvested from CrossRef and DataCite into an EPrint XML document based on one or more supplied DOI
  • ep3apid is a Unix style web service for interacting with an EPrint repository via a localhost proxy. It includes the ability to get restricted key lists as well as retrieve a simplified JSON record representing an EPrints record
  • ep3harvester is an EPrints 3.x metadata harvesting tool working at the MySQL 8 level for EPrints content. It harvests the contents into a MySQL 8 database, one table per eprints repository storing the harvested metadata in JSON columns. This tool can also harvest CSV files with information for people and groups referenced in the EPrints repositories.
  • ep3genfeeds is used to genate the JSON documents that drive our feeds website.
  • ep3datasets is a tool to generate dataset collections from previously harvested EPrints repositories

Use cases

Two primary use cases have driven development of EPrinttools

  1. Reusing the metadata and content in our EPrints 3.3.16 repositories (see Caltech Library Feeds
  2. Populating our EPrints repository from standardize data sources (see Acacia Project).

Related GitHub projects

  • py_dataset, This Python module provides access to dataset collections which we use as intermediate storage for JSON documents and related attachments.
  • AMES, The eprintools command line programs have been made available to Python via the AMES project. This include support for both read and write to EPrints repository systems.