godb

A set of annotation maps describing most of the Gene Ontology.


Keywords
Gene, Ontology
License
GPL-3.0
Install
pip install godb==0.0.5

Documentation

godb

godb is a Gene Ontology library for Python (2.7 and 3+) that contains a set of annotation maps describing most of the Gene Ontology.

It downloads, parses and exposes the Gene Ontology data in dataframes.

Note that the github version might not be stable; download using pip install godb.

Changelog

# 0.0.5 (20.10.2015)
- Made Python 3 compatible.
- Add version info (godb.__version__).

Usage

Get annotations

You'll get the annotation table with godb.get_annotations()

import godb
anno = godb.get_annotations()

anno.head(3)

GO id   Ontology    Term    Synonym Definition
GO:0000001  BP  mitochondrion inheritance   mitochondrial inheritance   The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.
GO:0000002  BP  mitochondrial genome maintenance        The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
GO:0000003  BP  reproduction    reproductive physiological process  The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.

len(anno)
# 41688

If there are multiple synonyms, these are separated with ;. While this makes the data untidy, it avoids having to include an arbitrary number of columns, many of which would be empty (for most rows).

Get maps of parents and children

With the functions get_children and get_offspring you get a two column map showing the parents of each child and all the ancestors of each child, respectively.

cc_children = godb.get_children("CC")

cc_children.head(3)
#            Child      Parent  Relation
# 0     GO:0000015  GO:0044445      is_a
# 1     GO:0000015  GO:1902494      is_a
# 2     GO:0000109  GO:0044428      is_a

len(cc_children)
# 5511

cc_offspring = godb.get_offspring("CC")

cc_offspring.head(3)
#     Offspring      Parent
# 0  GO:0000015  GO:0044445
# 0  GO:0000110  GO:0044428
# 1  GO:0000111  GO:0044428

len(cc_offspring)
# 30658

Both get_offspring and get_children take the argument relations, which is ["is_a", "part_of", "has_part"] by default. If you want to ignore certain relations when computing children or offspring, change this argument. R's GO.db uses the relationships ["is_a", "part_of"] to compute ancestors, so use these to get identical behavior.

get_offspring("CC", ["is_a", "part_of"]).head(3)
#          Offspring      Parent
# 0      GO:0000015  GO:0044445
# 1      GO:0000110  GO:0044428
# 2      GO:0000111  GO:0044428

Note that the first time a godb function is used, the gene ontology datafile will be downloaded and this may take some time. If you want to display a warning message, you need to set the logging level to INFO.

import logging
logging.basicConfig(level=logging.INFO)

Install

pip install godb

Requirements

joblib and pandas, both of which are automatically installed when using pip to install godb.

TODOs

  • (Possibly) Expose a command line interface similar to that of kg and biomartian. Do not use GO enough to warrant it yet, though.

Contribute

Report bugs, ask questions or request features at the issues page.

FAQ

How do I get the genes associated with a term?

biomartian -d rnorvegicus_gene_ensembl  -i external_gene_name -o go_id | shuf -n 10
Lpcat1  GO:0005509
Klb GO:0005975
LOC498555   GO:0003735
Map3k12 GO:0046777
Hoxb1   GO:0045944
Cir1    GO:0006397
Rhoc    GO:0005525
Casr    GO:0060613
Cib1    GO:1900026
Onecut1 GO:0002064

See biomartian for more info.

Inspiration

Rs go.db package.