DeltaTopic: Dynamically-Encoded Latent Transcriptomic pattern Analysis by Topic modeling
This is a project repository for our paper
- Y Zhang, M Khalilitousi, YP Park, Unraveling dynamically-encoded latent transcriptomic patterns in pancreatic cancer cells by topic modelling.
Summary
Building a comprehensive topic model has become an important research tool in single-cell genomics. With a topic model, we can decompose and ascertain distinctive cell topics shared across multiple cells, and the gene programs implicated by each topic can later serve as a predictive model in translational studies. Here, we present a Bayesian topic model that can uncover short-term RNA velocity patterns from a plethora of spliced and unspliced single-cell RNA-seq counts. We showed that modelling both types of RNA counts can improve robustness in statistical estimation and reveal new aspects of dynamic changes that can be missed in static analysis. We showcase that our modelling framework can be used to identify statistically-significant dynamic gene programs in pancreatic cancer data. Our results discovered that seven dynamic gene programs (topics) are highly correlated with cancer prognosis and generally enrich immune cell types and pathways.
Installation
DeltaTopic requires Python 3.8 or later. We recommend to use Miniconda.
Install DeltaTopic from PyPI using:
pip install DeltaTopic
To work with the latest development version, install from GitHub using:
python3 -m pip install git+https://github.com/causalpathlab/DeltaTopic
Data
We obtained the original FASTQ files for pancreatic ductal adenocarcinoma (PDAC) from the public repository provided by two PDAC studies. The spliced and unspliced count matrices were quantified by kb-python.
kb count -i index.idx -g t2g.txt -x 10xv2 -o ${output} \
-c1 spliced_t2c.txt -c2 unspliced_t2c.txt \
--workflow lamanno --filter bustools \
${fastq1} ${fastq2}
Run
# train BALASM model on the spliced count data
BALSAM --nLV 32 --EPOCHS 100
# train deltaTopic model
DeltaTopic --nLV 32 --EPOCHS 100