dropSeqPipe

A drop-seq pipeline


Keywords
single, cell, conda, drop-seq, dropseq, dropseqtools, multiqc, picard, pipeline, plot, reference-genome, scrb-seq, scrbseq, snakemake, star, umi, yaml
License
GPL-3.0
Install
pip install dropSeqPipe==0.23a0

Documentation

Snakemake

Description

This pipeline is based on snakemake and the dropseq tools provided by the McCarroll Lab. It allows to go from raw data of your dropSeq/scrbSeq experiment until the final count matrix with QC plots along the way. This is the tool we use in our lab to improve our wetlab protocol as well as provide an easy framework to reproduce and compare different experiments with different parameters.

It uses STAR to align. Is is working for both single cell and bulk data. Although the main goal is not bulk data sequencing, don't expect good support and improvement on this end.

The short term goal is to make it usable with a maximum of existing single cell protocols. It has been successfully tested on:

  • DropSeq
  • SCRBSeq
  • 10x

This package is trying to be as user friendly as possible. One of the hopes is that non-bioinformatician can make use of it without too much hassle. It will still require some command line execution, this will not be an interactive package.

CALL FOR SHARING PLOTS

One main idea of open source packages and data is that the community as a whole is stronger than working alone. I would like to open a platform to share the plots generated by the pipeline so that we can all learn from each other to improve our own wet lab protocol. I have not defined how to do this and am open for suggestions.

[0.23]

Changed

  • pre_align steps will output a fastq.gz instead of a fastq file.
  • fastqc.R is now compatible with paired and single end data.
  • Changed a few options in GLOBAL for UMI and Cell_barcodes options. Now possible to change filtering settings. See WIKI
  • STAR logs have been stripped of the STAR string. This is to allow for better compatibility with multiqc
  • Removed fastqc folder and moved items to logs folder. Grouping all logs files for better multiqc compatibility.
  • Changed generate_meta to generate-meta for keeping similar syntax between modes.
  • Added seperate log files for stats and summary in the DetectBeadSynthesisErrors.
  • Moved part of the READMEto the wiki.
  • Changed the name of the first expression matrix extracted before the species plot to unfiltered_expression.

Added

  • You can now run Bulk Single or paired end RNAseq data.
  • Started a wiki with a FAQ
  • Added options in GLOBAL config.yaml. You can now choose a range of options for UMI and Barcode filtering. please refer to the wiki for more information.
  • Support for MultiQC. MultiQC is a great way of summarising all of the logs from your experiment. As of today it supports 46 different modules (such as fastqc, trimmomatic, STAR, etc...) The generate-plots mode now produces a multiqc_report.html file in the plots folder.
  • New plot! BCDrop.pdf is a new plot showing you how many barcode and UMIs you dropped from the raw data before aligning. This helps to track how many samples you might loose because of low quality reads in the barcoding.

Installation

Before using it you will need to install some softwares:

  1. R
  2. STAR aligner
  3. Drop-seq tools (1.12)
  4. Picard tools
  5. fastqc
  6. Python3

Once you have everything just run:

git clone https://github.com/Hoohm/dropSeqPipe
cd dropSeqPipe
sudo python3 setup.py install

This will also automatically install all the R packages needed. (warnings are popping up, but it should work)

Please check our Wiki before trying to run the pipeline.

Future implementations (ordered by urgency)

Sooner

  • Integration of sircel for cell barcode selection
  • Integration of UMI-tools for UMI selection
  • Integration of kallisto for pseudoalignement
  • Test data for automatic test of the pipeline

Later

  • Cluster version
  • Cross language dependencies installation (based on conda)
  • Mixed reference genome generation
  • Adding specificity on the knee-plot for mixed experience
  • RData object of all the summary data and plots so that you can create your own report.
  • Docker for the package.
  • Add custom dropseqtools with TMPDIR

I hope it can help you out in your drop-seq experiment!

Feel free to comment and point out potential improvements.