Map gene ids using UniProt.


Keywords
uniprot, genes, mapping, conversion, gene, id, python
License
MIT
Install
pip install gene-map==0.4.3

Documentation

gene_map

PyPI Build Status

Tool for converting between various gene ids.

Installation

$ pip install gene_map

Usage

$ gene_map --help
Usage: gene_map [OPTIONS]

  Map gene ids between various formats.

Options:
  -i, --input TEXT                If it exists, treated as file with
                                  whitespace-separated gene ids. Otherwise
                                  treated as a gene id itself.  [required]
  --from TEXT                     Source ID type.  [required]
  --to TEXT                       Target ID type.  [required]
  -o, --output FILENAME           CSV-file to save result to.
  --organism [ARATH_3702|CAEEL_6239|CHICK_9031|DANRE_7955|DICDI_44689|DROME_7227|ECOLI_83333|HUMAN_9606|MOUSE_10090|RAT_10116|SCHPO_284812|YEAST_559292]
                                  Organism to convert IDs in.
  --cache-dir DIRECTORY           Folder to store ID-databases in.
  -q, --quiet                     Suppress logging of mapping-statistics.
  --force-download                Force download of mapping-database.
  --help                          Show this message and exit.

Getting started

Commandline usage

Inputs can be either gene ids or files containing whitespace-separated gene ids:

$ cat mygenes.txt
P63244 P08246
P68871
$ gene_map \
    -i P35222 -i InvalidID -i mygenes.txt -i P04637 \
    --from ACC --to Gene_Name \
    -o gene_mapping.csv
Mapped 5/6 genes.
$ cat gene_mapping.csv
ID_from,ID_to
P04637,TP53
P08246,ELANE
P35222,CTNNB1
P63244,RACK1
P68871,HBB

It is also possible to simply try to convert all given inputs without knowing their ID type, by using --from auto:

$ gene_map \
    -i P35222 \
    -i TP53 \
    -i '9606.ENSP00000306407' \
    --from auto \
    --to GeneID
Mapped 3/3 genes.
ID_from,ID_to
9606.ENSP00000306407,79007
P35222,1499
TP53,7157

Attention: if an ID is valid for multiple types, unintended side-effects may occur. Furthermore, all IDs are treated as strings.

API usage

>>> from gene_map import GeneMapper

>>> stringdb_ids = ['9606.ENSP00000306407', '9606.ENSP00000337461']
>>> gm = GeneMapper()  # defaults to HUMAN_9606
>>> gm.query(stringdb_ids, source_id_type='STRING', target_id_type='GeneID')
#                ID_from  ID_to
#0  9606.ENSP00000306407  79007
#1  9606.ENSP00000337461  90529