Single-Cell ATAC-seq analysis via Latent feature Extraction
SCALE neural network is implemented in Pytorch framework.
Running SCALE on CUDA is recommended if available.
install from GitHub
git clone git://github.com/jsxlei/SCALE.git cd SCALE python setup.py install
Installation only requires a few minutes.
- h5ad file
count matrix file:
- row is peak and column is barcode, in txt / tsv (sep="\t") or csv (sep=",") format
- mtx folder contains three files:
- count file: count in mtx format, filename contains key word "count" / "matrix"
- peak file: 1-column of peaks chr_start_end, filename contains key word "peak"
- barcode file: 1-column of barcodes, filename contains key word "barcode"
SCALE.py -d [input]
if cluster number k is known:
SCALE.py -d [input] -k [k]
Output will be saved in the output folder including:
- model.pt: saved model to reproduce results cooperated with option --pretrain
- adata.h5ad: saved data including Leiden cluster assignment, latent feature matrix and UMAP results.
- umap.pdf: visualization of 2d UMAP embeddings of each cell
Get binary imputed data in folder binary_imputed with option --binary (recommended for saving storage)
SCALE.py -d [input] --binary
or get numerical imputed data in file imputed_data.txt with option --impute
SCALE.py -d [input] --impute
- save results in a specific folder: [-o] or [--outdir]
- embed feature by tSNE or UMAP: [--embed] tSNE/UMAP
- filter low quality cells by valid peaks number, default 100: [--min_peaks]
- filter low quality peaks by valid cells number, default 10: [--min_cells]
- modify the initial learning rate, default is 0.002: [--lr]
- change iterations by watching the convergence of loss, default is 30000: [-i] or [--max_iter]
- change random seed for parameter initialization, default is 18: [--seed]
- binarize the imputation values: [--binary]
Look for more usage of SCALE
Use functions in SCALE packages.
import scale from scale import * from scale.plot import * from scale.utils import *
Tutorial Forebrain Run SCALE on dense matrix Forebrain dataset (k=8, 2088 cells)
Tutorial Mouse Atlas Run SCALE on sparse matrix Mouse Atlas dataset (k=30, ~80,000 cells)
Lei Xiong, Kui Xu, Kang Tian, Yanqiu Shao, Lei Tang, Ge Gao, Michael Zhang, Tao Jiang & Qiangfeng Cliff Zhang. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nature Communications, (2019).