doi2bibtex

Resolve DOIs and arXiv identifiers to formatted BibTeX entries


License
BSD-3-Clause
Install
pip install doi2bibtex==0.2.0

Documentation

doi2bibtex

Python versions: 3.8+ PyPI version Type annotations checked with MyPy Code style: black Test status

doi2bibtex is a small Python package that can be used to resolve DOIs (and other identifiers) into a BibTeX entry and format them according to a customizable set of rules (see below for a full list of features).

A GIF showing how to use doi2bibtex in the command line

Most features of doi2bibtex are availabe in other tools. For example, you can chain together doi2bib with bibtool or bibtex-tidy and recover most of the functionality in this package (and some of these tools are actually used under the hood). If you use a reference manager like zotero or Mendeley, you can also resolve papers based on an identifier and later export entries to a .bib file.

The motivation for doi2bibtex was rather personal and came from two facts: 1. I have a rather strong opinion on how I want my .bib files to look like, and 2. I work on the intersection of astrophysics and machine learning, meaning that I often need the NASA/ADS bibcodes for the adsurl field, but I can’t solely rely on ADS to retrieve BibTeX entries because I also frequently cite papers that are not indexed by ADS. At some point, I got tired of the ever-growing mess of shell scripts and bash commands that I used to achieve this, and decided to re-write as a single package that would be easier to maintain and extend.

🚀 Quickstart

Follow these instructions to get started with doi2bibtex:

🤓 Installation

You can simply pip-install the package using:

pip install doi2bibtex

Alternatively, you can also clone the repository and install the package locally:

git clone https://github.com/timothygebhard/doi2bibtex.git
cd doi2bibtex
pip install .

🔑 Setting up an API key for ADS

If you want to use the ads backend to resolve the adsurl (a feature which is enabled by default), you need to create an ADS account (if you do not alreay have one) and set up an API token to be able to query ADS. You can actually do this in two different ways:

  1. Set the environment variable ADS_TOKEN to your API key:

    export ADS_TOKEN="your-token";

    Ideally, you should add this line to your .bashrc or .zshrc file.

  2. Create a file ~/.doi2bibtex/ads_token and put your API key in there.

💻 Using the command line interface

Once installed, using the package is as simple as running the d2b command in your terminal:

d2b <doi-or-arxiv_id>

You can also add the --plain flag to output only the BibTeX entry without any fancy formatting. This can be useful if you, for example, want to pipe the output of the d2b command to another program.

⚙️ Changing the default configuration

A lot of the features of doi2bibtex can be configured via a ~/.doi2bibtex/config.yaml file. Here is an overview of all the supported options (with the default values):

abbreviate_journal_names: true  # Convert journal names to LaTeX macros (e.g., "\apj" instead of "The Astrophysical Journal")
citekey_delimiter: '_'          # Delimiter between the author name and the year of publication
convert_latex_chars: true       # Convert LaTeX-encoded characters in author names to Unicode
convert_month_to_number: true   # Convert month names to numbers (e.g., "1" instead of "jan")
crossmatch_with_dblp: false     # [EXPERIMENTAL] Try to crossmatch the paper with DBLP to add venue information to `addendum` (for ML conferences papers)
fix_arxiv_entrytype: true       # Convert arXiv entries to `@article`, set `journal` to "arXiv preprints", and drop the `eprinttype` field
format_author_names: true       # Convert author names to the "{Lastname}, Firstname" format
generate_citekey: true          # Create a citekey based on the first author and year of publication
limit_authors: 1000             # Limit the number of authors in the BibTeX entry
pygments_theme: 'dracula'       # Pygments theme used for syntax highlighting in the terminal
remove_fields:                  # Remove undesired fields (e.g., keywords) from the BibTeX entry
   - all: ['abstract']          # Remove the `abstract` from all entries, regardless of entrytype
   - article: ['publisher']     # Remove the `publisher` field from @article entries
remove_url_if_doi: true         # Remove the `url` field if it is redundant with the `doi` field
resolve_adsurl: true            # Query ADS to resolve the `adsurl` field, requires API token
update_arxiv_if_doi: true       # Update arXiv entries with DOI information, if available ("related DOI")

🦄 Features

Besides the eponymous ability of resolving DOIs (and other identifiers) to BibTeX entries, this package offers a lot more features for post-processing the entries. Here are some highlights:

  • Automatically resolve the adsurl field required by many astrophysics journals
  • Cross-match entries (in particular: arXiv preprints) with dblp.org to retrieve the venue information for conference papers from machine learning (e.g., "ICLR 2021"). Note: This feature is still experimental because querying dblp is somewhat fickle.
  • Convert LaTeX-encoded characters in author names to Unicode, for example, Müller instead of M{\"u}ller
  • Author names can automatically be converted to the {Lastname}, Firstname format
  • You can limit the number of authors in the BibTeX entry
  • Create a citekey based on the first author and year of publication. The author name is automatically made ASCII-compatible: for example, Đà Nẵng et al. (2023) becomes DaNang_2023.
  • Journal names can automatically be abbreviated according to the common LaTeX macros (e.g., \apj instead of The Astrophysical Journal)
  • Undesired fields (e.g, keywords) can be removed from the BibTeX entry (customizable for each entrytype — e.g., remove the publisher for articles, but keep it for books)
  • Easy to extend / modify: Feel free to fork this repository and adjust things to your own needs!

🥳 Contributing

Contributions in the form of pull requests are always welcome! Otherwise, you can of course also help the development by creating issues for bugs that you have encountered, or for new features that you would like to see implemented.

📃 License

This project is published under a BSD 3-Clause license; see the LICENSE file for details.