GMGC-mapper
Command line tool to query the Global Microbial Gene Catalog (GMGC).
Install
GMGC-mapper runs on Python 3.6-3.8 and requires prodigal to be available for genome mode.
Conda install
The easiest way to install GMGC-mapper is through bioconda, which will ensure
all dependencies (including prodigal
) are installed automatically:
conda install -c bioconda gmgc-mapper
pip install
Alternatively, GMGC-mapper
is available from PyPI, so can be installed
through pip:
pip install GMGC-mapper
Note that this does not install prodigal
(which is necessary for the
genome-based workflow).
Install from source
Finally, especially if you are retrieving the cutting edge version from Github, you can install with the standard
python setup.py install
Examples
- Input is a genome sequence.
gmgc-mapper -i input.fasta -o output
- Input is DNA/protein gene sequences
gmgc-mapper --nt-genes genes.fna --aa-genes genes.faa -o output
The nucleotide input is optional (but should be used if available so that the quality of the hits can be refined):
gmgc-mapper --aa-genes genes.faa -o output
If yout input is a metagenome, you can use NGLess for assembly and gene prediction. For more details, read the docs.
Output
The output folder will contain
- Outputs of gene prediction (prodigal).
- Complete data table, listing all the hits in GMGC, per gene.
- Complete table, listing all the genome bins (MAGs) that are found in the results.
- Human readable summary.
For more details, read the docs. A description of the outputs is also written to output folder for convenience.
Parameters
-
-i/--input
: path to the input genome file(.fasta/.gz/.bz2). -
-o/--output
: Output directory (will be created if non-existent). -
--nt-genes
: path to the input DNA gene file(.fasta/.gz/.bz2). -
--aa-genes
: path to the input Protein gene file(.fasta/.gz/.bz2).