Genes is a Django app to represent genes.
Download and Install
This package is registered as
django-genes in PyPI and is pip
pip install django-genes
If any of the following dependency packages are not found on your
pip will install them too:
django 1.8 or later(Django web framework)
Organismsmodel, which is required by
Search Indexes and Data Templatesection.)
django-fixtureless(for unittest, see
1. Add 'genes' and 'organisms' to your
setting like this:
INSTALLED_APPS = ( ... 'organisms', 'genes', )
python manage.py migrate command to create
3. (Optional) The following step is only needed if you have
django-tastypie installed to create a REST API for your project and
would like to have API endpoints for
Add the following to your project's
# There are probably already other imports here, such as: # from django.conf.urls import url, patterns, include # If you have not already done so, import the tastypie API: from tastypie.api import Api # Import the API Resources for Organisms and Genes: from organisms.api import OrganismResource from genes.api import GeneResource # If you have not already done so, initialize your API and # add the Organism and Gene Resources to it. You can also register # the CrossRefResource and CrossRefDBResource if you want to have # API endpoints for them as well. v0_api = Api() v0_api.register(OrganismResource()) v0_api.register(GeneResource()) v0_api.register(CrossRefResource()) v0_api.register(CrossRefDBResource()) # In the urlpatterns, include the urls for this api: urlpatterns = patterns('', ... (r'^api/', include(v0_api.urls)) )
Search Indexes and Data Template
search_indexes.py can be used by django haystack
(https://github.com/django-haystack/django-haystack) to search genes.
It includes the Gene fields that should be included in the search
index, and how they should be weighted. The
text field refers to a
document that is built for the search engine to index. The location of
data template for this document is:
For more information, see: http://django-haystack.readthedocs.org/en/latest/tutorial.html#handling-data
Usage of Management Commands
This app includes five management commands in
This command adds cross-reference databases for genes. It must be called for every new cross-reference database to populate the gene and cross-reference objects in the database. It requires 2 arguments:
- name: the name of the database
- URL: the URL for that database, with the string '_REPL_' added at the end of the URL
For example, this command adds Ensembl as a cross-reference database:python manage.py genes_add_xrdb --name=Ensembl --URL=http://www.ensembl.org/Gene/Summary?g=_REPL_
And this command adds MIM as a cross-reference database:python manage.py genes_add_xrdb --name=MIM --URL=http://www.ncbi.nlm.nih.gov/omim/_REPL_
This command parses gene info file(s) and saves the corresponding gene objects into the database. It takes 2 required arguments and 5 optional arguments:
- (Required) geneinfo_file: location of gene info file;
- (Required) taxonomy_id: taxonomy ID for organism for which genes are being populated;
- (Optional) gi_tax_id: alternative taxonomy ID for some organisms (such as S. cerevisiae);
- (Optional) symbol_col: symbol column in gene info file. Default is 2;
- (Optional) systematic_col: systematic column in gene info file. Default is 3;
- (Optional) alias_col: the column containing gene aliases. If a hyphen '-' or blank space ' ' is passed, symbol_col will be used. Default is 4.
- (Optional) put_systematic_in_xrdb: name of cross-reference Database for which you want to use organism systematic IDs as CrossReference IDs. This is useful for Pseudomonas, for example, as systematic IDs are saved into PseudoCAP cross-reference database.
The following example shows how to download a gzipped human gene info file from NIH FTP server, and populate the database based on this file.# Create a temporary data directory: mkdir data # Download a gzipped human gene info file into data directory: wget -P data/ -N ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz # Unzip downloaded file: gunzip -c data/Homo_sapiens.gene_info.gz > data/Homo_sapiens.gene_info # Call genes_load_geneinfo to populate the database: python manage.py genes_load_geneinfo --geneinfo_file=data/Homo_sapiens.gene_info --taxonomy_id=9606 --systematic_col=3 --symbol_col=2
This command can be used to populate database with UniProtKB identifiers. It takes one argument:
- uniprot_file: location of a file mapping UniProtKB IDs to Entrez and Ensembl IDs
Important: Before calling this command, please make sure that both Ensembl and Entrez identifiers have been loaded into the database.
After downloading the gzipped file, use
zgrepcommand to get the lines we need (the original file is quite large), then run this command:wget -P data/ -N ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz zgrep -e "GeneID" -e "Ensembl" data/idmapping.dat.gz > data/uniprot_entrez_ensembl.txt python manage.py genes_load_uniprot --uniprot_file=data/uniprot_entrez_ensembl.txt
This command can be used to populate database with WormBase identifiers. It takes 3 arguments:
- (Required) wb_url: URL of wormbase xrefs file;
- (Optional) db_name: the name of the cross-reference database, default is 'WormBase'.
As is expected, the WormBase cross-reference database should be populated using the
genes_add_xrdbcommand (see command #1) before this command to populate the WormBase identifiers. Here is an example:# Find latest version of WormBase here: # http://www.wormbase.org/about/release_schedule#102--10-1 python manage.py genes_load_wb --wb_url=ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS243.xrefs.txt.gz
This management command will read an input gene history file and find all genes whose tax_id match input taxonomy ID. If the gene already exists in the database, the Gene record in database will be set as obsolete; if not, a new obsolete Gene record will be created in the database.
The command accepts 2 required arguments and 3 optional arguments:
- (Required) gene_history_file: Input gene history file. A gzipped example file can be found at: ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz
- (Required) tax_id: Taxonomy ID assigned by NCBI to a certain organism. Genes of the other organisms in input file will be skipped.
- (Optional) tax_id_col: column number of tax_id in input file. Default is 1.
- (Optional) discontinued_id_col: column number of discontinued GeneID in input file. Default is 3.
- (Optional) discontinued_symbol_col: column number of gene's discontinued symbol in input file. Default is 4.
Note that column numbers in the last three arguments all start from 1, not 0.
For example, to add obsolete genes whose tax_id is 208964 in the file "gene_history", we will use the command like this:# Download file into your data directory: cd /data_dir; wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz # Unzip the downloaded file into "gene_history" gunzip gene_history.gz # Run management command: python manage.py genes_load_gene_history /data_dir/gene_history 208964 --tax_id_col=1 --discontinued_id_col=3 --discontinued_symbol_col=4
--tax_id_col=1 --discontinued_id_col=3 --discontinued_symbol_col=4are optional because they are using default values.)