seqpipe

Sequencing pipeline


Keywords
bioinformatics, sequencing, pipeline, alignment, mapping
License
MIT
Install
pip install seqpipe==0.0.7

Documentation

RNAseq Analysis Pipeline

PyPI Build Status

Installation

Install seqpipe using pip:

$ pip install seqpipe

seqpipe offers various commandline-arguments:

$ seqpipe --help
Usage: seqpipe [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  map
  stats

In particular, the mapping pipeline is interesting:

$ seqpipe map --help
Usage: seqpipe map [OPTIONS]

Options:
  -r, --read PATH             Path to read file/directory.  [required]
  -g, --genome PATH           Path to genome file/directory.  [required]
  -o, --output DIRECTORY      Directory to save results to.
  --scripts / --no-scripts    Whether to execute scripts or not.
  -m, --min-read-len INTEGER  Minimal read length.
  -M, --max-read-len INTEGER  Maximal read length.
  -b, --bowtie-args TEXT      Extra arguments for bowtie.
  -t, --threads INTEGER       How many threads to run in.
  --help                      Show this message and exit.

Usage

To map a directory of reads to two references, execute it as follows:

$ seqpipe map \
  -r seqpipe/tests/data/reads/ \
  -g seqpipe/tests/data/references/10-ref.fa \
  -g seqpipe/tests/data/references/20-ref.fa \
  -o my_mapping

This will create a my_mapping directory which contains two directories:

  • runs stores all data related to each individual read file
  • results contains data generated by scripts from the scripts folder

An overview of the read distributions can then be generated via:

$ seqpipe stats plot_rdist -o my_images/ my_mapping/

Extras

Additional useful scripts are contained in extra. The entry point is main.py (check python ./extra/main.py --help for help).

The respective individual files are:

  • sequential_pipeline.sh
    • map length-filtered reads against multiple genomes in succession
  • plot_sequential_data.py
    • visualize data obtained from sequential pipeline
  • plot_expression_differences.py
    • visualize differences in RNAseq expression levels over pairs of samples

Links

Dependencies

Tools:

Languages:

  • bash
  • python
    • numpy
    • scipy
    • pandas
    • seaborn
    • matplotlib
    • tqdm
    • biopython
    • pysam
    • joblib
    • click
    • sh
    • colorama

Development notes

Tests

Run tests using:

$ tox

Release package

This guide assumes a properly setup ~/.pypirc.

Build package:

$ python setup.py sdist

Register it (only once):

$ twine register dist/seqpipe-X.Y.Z.tar.gz

Try installation locally:

$ rm -rf /tmp/seqpipe_tmp
$ virtualenv /tmp/seqpipe_tmp
$ /tmp/seqpipe_tmp/bin/pip install dist/seqpipe-X.Y.Z.tar.gz
$ /tmp/seqpipe_tmp/bin/seqpipe --help

Try installation using test-servers:

$ twine upload -r test dist/seqpipe-X.Y.Z.tar.gz
$ pip install -i https://testpypi.python.org/pypi seqpipe
$ seqpipe --help

Check testpypi-page.

Finally, install it on actual server:

$ twine upload dist/seqpipe-X.Y.Z.tar.gz
$ pip install -U seqpipe
$ seqpipe --help

Check actual pypi-page.

Misc

Create dev-builds with:

$ pip install --user -e .

Run uninstalled version:

$ python -m seqpipe.main

Enable bash-autocompletion as follows:

$ _SEQPIPE_COMPLETE=source seqpipe >> ~/.bashrc
$ . ~/.bashrc