Description
This pipeline is based on snakemake and the dropseq tools provided by the McCarroll Lab. It allows to go from raw data of your dropSeq/scrbSeq experiment until the final count matrix with QC plots along the way. This is the tool we use in our lab to improve our wetlab protocol as well as provide an easy framework to reproduce and compare different experiments with different parameters.
It uses STAR to align. Is is working for both single cell and bulk data. Although the main goal is not bulk data sequencing, don't expect good support and improvement on this end.
The short term goal is to make it usable with a maximum of existing single cell protocols. It has been successfully tested on:
- DropSeq
- SCRBSeq
- 10x
This package is trying to be as user friendly as possible. One of the hopes is that non-bioinformatician can make use of it without too much hassle. It will still require some command line execution, this will not be an interactive package.
CALL FOR SHARING PLOTS
One main idea of open source packages and data is that the community as a whole is stronger than working alone. I would like to open a platform to share the plots generated by the pipeline so that we can all learn from each other to improve our own wet lab protocol. I have not defined how to do this and am open for suggestions.
[0.23]
Changed
- pre_align steps will output a fastq.gz instead of a fastq file.
-
fastqc.R
is now compatible with paired and single end data. - Changed a few options in
GLOBAL
forUMI
andCell_barcodes
options. Now possible to change filtering settings. See WIKI - STAR logs have been stripped of the
STAR
string. This is to allow for better compatibility with multiqc - Removed
fastqc
folder and moved items tologs
folder. Grouping all logs files for better multiqc compatibility. - Changed
generate_meta
togenerate-meta
for keeping similar syntax between modes. - Added seperate log files for stats and summary in the DetectBeadSynthesisErrors.
- Moved part of the
README
to the wiki. - Changed the name of the first expression matrix extracted before the species plot to
unfiltered_expression.
Added
- You can now run Bulk Single or paired end RNAseq data.
- Started a wiki with a FAQ
- Added options in
GLOBAL
config.yaml. You can now choose a range of options for UMI and Barcode filtering. please refer to the wiki for more information. - Support for MultiQC. MultiQC is a great way of summarising all of the logs from your experiment. As of today it supports 46 different modules (such as fastqc, trimmomatic, STAR, etc...) The
generate-plots
mode now produces amultiqc_report.html
file in the plots folder. - New plot! BCDrop.pdf is a new plot showing you how many barcode and UMIs you dropped from the raw data before aligning. This helps to track how many samples you might loose because of low quality reads in the barcoding.
Installation
Before using it you will need to install some softwares:
Once you have everything just run:
git clone https://github.com/Hoohm/dropSeqPipe
cd dropSeqPipe
sudo python3 setup.py install
This will also automatically install all the R packages needed. (warnings are popping up, but it should work)
Please check our Wiki before trying to run the pipeline.
Future implementations (ordered by urgency)
Sooner
- Integration of sircel for cell barcode selection
- Integration of UMI-tools for UMI selection
- Integration of kallisto for pseudoalignement
- Test data for automatic test of the pipeline
Later
- Cluster version
- Cross language dependencies installation (based on conda)
- Mixed reference genome generation
- Adding specificity on the knee-plot for mixed experience
- RData object of all the summary data and plots so that you can create your own report.
- Docker for the package.
- Add custom dropseqtools with TMPDIR
I hope it can help you out in your drop-seq experiment!
Feel free to comment and point out potential improvements.