plentyofbugs

Find your sequenced isolate a compatable reference genome based on location, interests, and mash distance.

This repository contains the pipeline to identity a suitable reference isolate based on comparing a mini assembly to a set of complete genomes via mash. For more information on using this with riboSeed, see this page on choosing reference genomes.

Installation

Manual Installation

Dependencies

plentyofbugs requires

skesa (or spades)
seqtk
mash

conda create pob skesa seqtk mash
conda activate pob
pip install plentyofbugs
plentyofbugs  -h

Running

Get example data

The data in the test_data directory was gathered using the test_data/get_data.sh script. It consists of 19 plasmids from E coli, and a test plasmid from which reads were generated using ART. https://www.niehs.nih.gov/research/resources/software/biostatistics/art/. The contigs in this diretory were made with SKESA.

Running on example data

# running with raw reads
plentyofbugs -g ./test_data/plasmids/  -f ./test_data/test_reads1.fq -o tmp
# running with an assembly instead of raw reads
plentyofbugs -g ./test_data/plasmids/  --assembly ./test_data/contigs.fasta -o tmp
# running on a new cast of E coli genomes to compare to, downloading the required genomes on the way
plentyofbugs -g ./new_comparison_e_coli/ -n 5  --assembly ./test_data/contigs.fasta -o tmp --genus_species "Escherichia coli"

Just want the download the genomes?

Plentyofbugs includes the genomes-getter as a standalone script: get_n_genomes.

get_n_genomes -o "Escherichia coli" -g tmpnewgenomes -n 4

Running via container

NOTE: To run the legacy version that used pyani, run a version older than 0.87 with Docker or singularity -- it will save yourself a lot of trouble!

Docker

docker run --rm -t -v  ${PWD}:/data/ nickp60/plentyofbugs:0.97 -f /data/test_reads1.fq --genus_species "Escherichia coli" -n 5 -o /data/results/

which is

docker run --rm -t -v  <current directory>:/data/ nickp60/plentyofbugs:0.97  -f /data/<name of F reads file> --genus_species "<bug of interest>" -n <max number of strains to compare with>  -o /data/<name for output folder>/

Singularity

singularity pull docker://nickp60/plentyofbugs:0.97
plentyofbugs:0.92.sing -f ./test_reads1.fq --genus_species "Escherichia coli" -n 5  -o ./results/

Under the hood

What it does:

Runs a mini assembly with skesa or SPAdes
Identify which complete genomes are available for the genus and species
optional Subset number of complete genomes
Download reference genomes
Indexes/sketches genomes
Calculate the Mash distance/ANI of the mini assembly to the database of reference strains
Report the closest reference genome and the distance/ANI

PlentyofBugs
Release 0.999

Release 0.999

0.999

0.99

0.98

0.97

0.96

0.95

0.94

0.93

0.92

0.0.92

Documentation

plentyofbugs

Installation

Manual Installation

Dependencies

Running

Get example data

Running on example data

Just want the download the genomes?

Running via container

Docker

Singularity

Under the hood

What it does:

Stats

Development practices

Releases

Contributors

PlentyofBugs Release 0.999

Release 0.999 Toggle Dropdown 0.999 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.0.92

Documentation

plentyofbugs

Installation

Manual Installation

Dependencies

Running

Get example data

Running on example data

Just want the download the genomes?

Running via container

Docker

Singularity

Under the hood

What it does:

Stats

Development practices

Releases

Contributors

PlentyofBugs
Release 0.999

Release 0.999

0.999

0.99

0.98

0.97

0.96

0.95

0.94

0.93

0.92

0.0.92