This pipeline is based on snakemake and the dropseq tools provided by the McCarroll Lab. It allows to go from raw data of your dropSeq/scrbSeq experiment until the final count matrix with QC plots along the way. This is the tool we use in our lab to improve our wetlab protocol as well as provide an easy framework to reproduce and compare different experiments with different parameters.
It uses STAR to align. Is is working for both single cell and bulk data. Although the main goal is not bulk data sequencing, don't expect good support and improvement on this end.
The short term goal is to make it usable with a maximum of existing single cell protocols. It has been successfully tested on:
This package is trying to be as user friendly as possible. One of the hopes is that non-bioinformatician can make use of it without too much hassle. It will still require some command line execution, this will not be an interactive package.
CALL FOR SHARING PLOTS
One main idea of open source packages and data is that the community as a whole is stronger than working alone. I would like to open a platform to share the plots generated by the pipeline so that we can all learn from each other to improve our own wet lab protocol. I have not defined how to do this and am open for suggestions.
- pre_align steps will output a fastq.gz instead of a fastq file.
fastqc.Ris now compatible with paired and single end data.
- Changed a few options in
Cell_barcodesoptions. Now possible to change filtering settings. See WIKI
- STAR logs have been stripped of the
STARstring. This is to allow for better compatibility with multiqc
fastqcfolder and moved items to
logsfolder. Grouping all logs files for better multiqc compatibility.
generate-metafor keeping similar syntax between modes.
- Added seperate log files for stats and summary in the DetectBeadSynthesisErrors.
- Moved part of the
READMEto the wiki.
- Changed the name of the first expression matrix extracted before the species plot to
- You can now run Bulk Single or paired end RNAseq data.
- Started a wiki with a FAQ
- Added options in
GLOBALconfig.yaml. You can now choose a range of options for UMI and Barcode filtering. please refer to the wiki for more information.
- Support for MultiQC. MultiQC is a great way of summarising all of the logs from your experiment. As of today it supports 46 different modules (such as fastqc, trimmomatic, STAR, etc...) The
generate-plotsmode now produces a
multiqc_report.htmlfile in the plots folder.
- New plot! BCDrop.pdf is a new plot showing you how many barcode and UMIs you dropped from the raw data before aligning. This helps to track how many samples you might loose because of low quality reads in the barcoding.
Before using it you will need to install some softwares:
Once you have everything just run:
git clone https://github.com/Hoohm/dropSeqPipe cd dropSeqPipe sudo python3 setup.py install
This will also automatically install all the R packages needed. (warnings are popping up, but it should work)
Please check our Wiki before trying to run the pipeline.
Future implementations (ordered by urgency)
- Integration of sircel for cell barcode selection
- Integration of UMI-tools for UMI selection
- Integration of kallisto for pseudoalignement
- Test data for automatic test of the pipeline
- Cluster version
- Cross language dependencies installation (based on conda)
- Mixed reference genome generation
- Adding specificity on the knee-plot for mixed experience
- RData object of all the summary data and plots so that you can create your own report.
- Docker for the package.
- Add custom dropseqtools with TMPDIR
I hope it can help you out in your drop-seq experiment!
Feel free to comment and point out potential improvements.