SnakeMake Bioinformatics Library.


Keywords
Bioinformatics, Computational, Biology, Next-Generation, Sequencing, Snakemake, Pipelines
License
MIT
Install
pip install SMBL==1.4.5

Documentation

SMBL - SnakeMake Bioinformatics Library

https://travis-ci.org/karel-brinda/smbl.svg?branch=master https://travis-ci.org/karel-brinda/smbl.svg?branch=devel

In case of any problem, don't hesitate to contact me on karel.brinda@gmail.com.

Short description

SMBL is a library of some useful rules and Python functions which can be used in Snakemake (https://bitbucket.org/johanneskoester/snakemake/) pipelines. It makes possible to automatically install various bioinformatics programs like read mappers, read simulators, conversion tools, etc. It supports also downloading and conversion of some important references in FASTA format (e.g., human genome).

Installation / upgrade

To install SMBL, you need to have Unix-like operating system (e.g., Linux, MacOS) and Python at least 3.3. Installation / upgrade can be performed using the following command.

pip3 install --upgrade smbl

If SnakeMake has not been installed, yet, it will be installed automatically with SMBL.

The current version of SMBL from git can be installed by

pip3 install --upgrade git+git://github.com/karel-brinda/smbl

Requirements

To be able to download and install software automatically, SMBL requires the following programs to be present in you Unix system:

  • wget or curl
  • gcc 4.7+
  • git
  • make

Usage

To use SMBL, you have to import the smbl Python package and include a file with all rules using:

import smbl
include: smbl.include()

Then you can use all supported programs or data. When they appear as input of a rule, they will be downloaded or compiled.

All the programs are installed into ~/.smbl/bin/ and all FASTA files into ~/.smbl/fa/.

Programs

Program Variable with its filename Link
art_454 smbl.prog.ART_454 http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
art_illumina smbl.prog.ART_ILLUMINA http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
art_solid smbl.prog.ART_SOLID http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
bcftools smbl.prog.BCFTOOLS http://github.com/samtools/bcftools
bfast smbl.prog.BFAST http://github.com/nh13/bfast
bgzip smbl.prog.BGZIP http://github.com/samtools/htslib
bowtie2 smbl.prog.BOWTIE2 http://github.com/BenLangmead/bowtie2
bowtie2-build smbl.prog.BOWTIE2_BUILD http://github.com/BenLangmead/bowtie2
bowtie2-inspect smbl.prog.BOWTIE2_INSPECT http://github.com/BenLangmead/bowtie2
bwa smbl.prog.BWA http://github.com/lh3/bwa
curesim.jar smbl.prog.CURESIM http://www.pegase-biosciences.com/tools/curesim/
curesim_eval.jar smbl.prog.CURESIM_EVAL http://www.pegase-biosciences.com/tools/curesim/
deez smbl.prog.DEEZ http://github.com/sfu-compbio/deez
drfast smbl.prog.DRFAST http://github.com/BilkentCompGen/drfast
dwgsim smbl.prog.DWGSIM http://github.com/nh13/dwgsim
dwgsim_eval.pl smbl.prog.DWGSIM_EVAL http://github.com/nh13/dwgsim
freec smbl.prog.FREEC http://bioinfo-out.curie.fr/projects/freec/
gem-indexer smbl.prog.GEM_INDEXER http://algorithms.cnag.cat/wiki/The_GEM_library
gem-mapper smbl.prog.GEM_MAPPER http://algorithms.cnag.cat/wiki/The_GEM_library
gem-2-sam smbl.prog.GEM_2_SAM http://algorithms.cnag.cat/wiki/The_GEM_library
gnuplot4 smbl.prog.GNUPLOT4 http://www.gnuplot.info/
gnuplot5 smbl.prog.GNUPLOT5 http://www.gnuplot.info/
kallisto smbl.prog.KALLISTO https://github.com/pachterlab/kallisto
lastal smbl.prog.LASTAL http://last.cbrc.jp/
lastdb smbl.prog.LASTDB http://last.cbrc.jp/
mason_frag_sequencing smbl.prog.MASON_FRAG_SEQUENCING http://packages.seqan.de/mason2/
mason_genome smbl.prog.MASON_GENOME http://packages.seqan.de/mason2/
mason_materializer smbl.prog.MASON_MATERIALIZER http://packages.seqan.de/mason2/
mason_methylation smbl.prog.MASON_METHYLATION http://packages.seqan.de/mason2/
mason_simulator smbl.prog.MASON_SIMULATOR http://packages.seqan.de/mason2/
mason_splicing smbl.prog.MASON_SPLICING http://packages.seqan.de/mason2/
mason_variator smbl.prog.MASON_VARIATOR http://packages.seqan.de/mason2/
mrfast smbl.prog.MRFAST http://github.com/BilkentCompGen/mrfast
mrsfast smbl.prog.MRSFAST http://mrsfast.sourceforge.net/
perm smbl.prog.PERM http://code.google.com/p/perm/
pbsim smbl.prog.PBSIM https://code.google.com/p/pbsim
picard smbl.prog.PICARD http://broadinstitute.github.io/picard/
sambamba smbl.prog.SAMBAMBA http://lomereiter.github.io/sambamba/
samtools smbl.prog.SAMTOOLS http://github.com/samtools/samtools
sirfast smbl.prog.SIRFAST http://github.com/BilkentCompGen/sirfast
storm-color smbl.prog.STORM_COLOR http://bioinfo.lifl.fr/yass/iedera_solid/storm/
storm-nucleotide smbl.prog.STORM_NUCLEOTIDE http://bioinfo.lifl.fr/yass/iedera_solid/storm/
tabix smbl.prog.TABIX http://github.com/samtools/htslib
twoBitToFa smbl.prog.TWOBITTOFA http://hgdownload.cse.ucsc.edu/admin/exe/
vcfutils.pl smbl.prog.VCFTULS http://github.com/samtools/bcftools
wgsim smbl.prog.WGSIM http://github.com/lh3/wgsim
wgsim_eval.pl smbl.prog.WGSIM_EVAL http://github.com/lh3/wgsim
xs smbl.prog.XS http://bioinformatics.ua.pt/software/xs/

FASTA files

FASTA file Variable with its filename
An example small FASTA file smbl.fasta.EXAMPLE_1
An example small FASTA file smbl.fasta.EXAMPLE_2
An example small FASTA file smbl.fasta.EXAMPLE_3
Human genome HG38 (GRCh38) smbl.fasta.HG38, smbl.fasta.HUMAN_GRCH38
Mouse genome MM10 smbl.fasta.MOUSE_MM10
Chimpanzee genome PANTR04 smbl.fasta.CHIMP_PANTRO4

Example

The following example demonstrates how SMBL can be used for automatic installation of software.

Create an empty file named Snakefile with the following content:

import smbl
include: smbl.include()

rule all:
        input:
                smbl.prog.DWGSIM,
                smbl.prog.BWA,
                smbl.fasta.EXAMPLE
        params:
                PREF="simulated_reads",
                INDEX="bwa_index"
        output:
                "alignment.sam"
        run:
                # read simulation
                shell("{input[0]} -C 1 {input[2]} {params.PREF}")

                # creating BWA index of the reference sequence
                shell("{input[1]} index {input[2]}")

                # mapping by BWA
                shell("{input[1]} mem {input[2]} {params.PREF}.bfast.fastq > alignment.sam")

Run the script.

snakemake

What happens:

  1. An example FASTA file is downloaded
  2. DwgSim and BWA are downloaded, compiled and installed
  3. DwgSim simulates reads from the example Fasta file
  4. These reads are mapped back to the reference by BWA (alignment.sam is created)