BGlab Smart-Seq2 preprocessing toolkit
Introduction
smqpp is the preprocessing pipeline for Smart-Seq2 data, specifically for datasets generated from Gottgens lab. The QC part of code was adpoted from bglab package developed by Wajid Jawaid.
The package contains the following steps:
-
Preanalysis
- generate_feature_table: If gene feature table not available then this can be generate using this function
- read_in_files: Read in count and QC inputs and format them into anndata object
- reformat_meta: Reformat metatable to keep all versions consistent (Due to different versions of metadata spread sheet from google drive)
- smartseq_qc: bglab equivalent quality control
- normalise_data: Data normalisation using DESeq2 method
- quantile_norm: Quantile normalisation
- quantile_norm_log: Log quantile normalisation
- downsampling_norm: Downsampling normalisation (Not recommanded for TenX as it will shrink more counts to 0)
- tech_var: Highly variable gene (HVG) calculation using Brennecke et. al method
- plot_tech_var: Plot the HVG prediction
- detect_outlier_cells: filter out cells that effect the selection of HVGs
-
Differential expression analysis
- plot_ma: MAplot for rank_genes_group from Scanpy and select significant genes with high confidence
-
Pseudotime time analysis
- GeneExp_LLR_test: Likelihood ratio test to select genes that differentially expressed along pseudotime. Linear models were fitted between log norm exp and smoothed PT by applying natural spline.
- plot_genes_along_pt: Plotting out gene expression pattern along PT. Gene exp was smoothed using Guassian filter.
-
Projection
- quick_neighbors: Neighbors calculation adpoted from scanpy. Two constraints applied: 1) reference cells only allow neighbors between themselves; 2) new cells only allow neighbors with reference cells
- quick_umap: Similar to the ingest function in scanpy. Umap was calculated using umap python package. Parameters used as scanpy defaults.
- quick_umap_proj: Projection of new data onto reference data
-
3d plots
- plot_3d: Generate 3d plots from anndata object as the projection='3d' function does not work properly in the latest scanpy due to matplotlib issues
-
Pathway analysis:
- pathway_score_cal: Calculate geometric mean for each terms in the databse for each cell, which can be used to color the defined layout
- pathway_analysis: Calculate the enriched database terms for a given gene set using hypergeometric test.
Installation
smqpp depends on numpy, matplotlib, pandas, anndata, scipy and statsmodels. The package is available on pip and can be easily installed as follows:
pip install smqpp
or
download the file from github using git clone
tar zxvf smqpp
cd smqpp
pip install .
Usage and Documentation
The smqpp should be fairly simple to use and it is based on Scanpy's AnnData object:
import smqpp
smqpp.read_in_files(...)
Example Notebooks
Examples can be found in the following folders:
Contact
If there are any issues, please contact xw251@cam.ac.uk.