DISC
An accurate and scalable imputation algorithm based on semi-supervised deep learning for single-cell transcriptome.
- Free software: Apache License 2.0
Requirements
- Python >=3.6
- TensorFlow >=1.13.1,<2.0.0
- numpy >=1.14.0
- pandas >=0.21.0
- h5py >=2.9.0
Installation
- Install TensorFlow
-
If you have an Nvidia GPU, be sure to install a version of TensorFlow that supports it first -- DISC runs much faster with GPU:
pip install "tensorflow-gpu>= 1.13.1,<2.0.0"
We typically tensorflow-gpu==1.13.1.
Here are requirements for GPU version TensorFlow:
* Hardware * NVIDIA GPU card with CUDA Compute Capability 3.5 or higher. * Software * NVIDIA GPU drivers - CUDA 10.0 requires 410.x or higher. * CUDA Toolkit - TensorFlow_ supports CUDA 10.0 (TensorFlow >= 1.13.0) * CUPTI ships with the CUDA Toolkit. * cuDNN SDK (>= 7.4.1)
see this for further information
- Install DISC with pip
-
To install with
pip
, run the following from a terminal:pip install disc
- Install DISC from GitHub
-
To clone the repository and install manually, run the following from a terminal:
git clone git://github.com/iyhaoo/DISC.git cd disc python setup.py install
Usage
- Quick Start
-
-
Run DISC:
disc \ --dataset=matrix.loom \ --out-dir=out_dir
where
matrix.loom
is a loom-formatted raw count matrix with genes in rows and cells in columns andout_dir
is the path of output directory. -
Results:
-
log.tsv
: a tsv-formatted log file that records training states. -
summary.pdf
: a pdf-formatted file that visualizes the fitting line and optimal point and it will be updated in real time when running. -
summary.tsv
: a tsv-formatted file that shows the raw data of visualization. -
result
: a directory for imputaion results as below:-
imputation.loom
: a loom-formatted imputed matrix with genes in rows and cells in columns. -
feature.loom
: a loom-formatted dimensionally reduced feature matrix provided by our method based on the imputed matrix above with feature in rows and cells in columns. -
running_info.hdf5
: a hdf5-formatted saved some basic information about the input dataset such as library size, genes used for modelling and so on.
-
-
models
: a directory for trained models in every save interval
-
-
- Data availability
-
We provide loom-formatted original, raw, down-sampled (DS), imputed raw/DS RNA-seq data and FISH data.
-
- MELANOMA :
-
8,640 cells from the melanoma WM989 cell line were sequenced using Drop-seq, where 32,287 genes were detected (MELANOMA). In addition, RNA FISH experiment of across 7,000-88,000 cells from the same cell line was conducted and 26 genes were detected (MELANOMA_FISH).
The original, raw, DS (0.5), imputed raw/DS RNA-seq data and FISH data are provide here.
-
- SSCORTEX :
-
Mouse somatosensory cortex of CD-1 mice at age of p28 and p29 were profiled by 10X where 7,477 cells were detected (SSCORTEX). In addition, osmFISH experiment of 4,839 cells from somatosensory cortex, hippocampus and ventricle of a CD-1 mouse at age of p22 was conducted and 33 genes were detected (SSCORTEX_FISH).
The original, raw RNA-seq data and FISH data are provide here.
-
- RETINA :
-
Retinas of mice at age of p14 were profiled in 7 different replicates on by Drop-seq, where 6,600, 9,000, 6,120, 7,650, 7,650, 8280, and 4000 (49,300 in total) STAMPs (single-cell transcriptomes attached to micro-particles) were collected with totally 24,658 genes detected (RETINA).
The raw RNA-seq data and the RDS-formatted cluster assignments data from the original study are provide here.
-
- BRAIN_SPLiT :
-
156,049 mice nuclei from developing brain and spinal cord at age of p2 or p11 mice were profiled by SPLiT-seq, where 26,894 genes were detected (BRAIN_SPLiT).
The raw RNA-seq data and the RDS-formatted cluster assignments data from the original study are provide here.
-
- BRAIN_1.3M :
-
1,306,127 cells from combined cortex, hippocampus, and subventricular zone of 2 E18 C57BL/6 mice were profiled by 10X, where 27998 genes were detected (BRAIN_1.3M).
-
- Tutorials
-
- Data preparation and imputation
- Data pre-processing (MELANOMA, SSCORTEX, CBMC, RETINA, BRAIN_SPLiT)
- Run imputation
- Reproducing our results:
- Supplementary topics:
- Data preparation and imputation
References
Yao He#, Hao Yuan#, Cheng Wu#, Zhi Xie*. "Reliable and efficient imputation and cell type identification for single-cell transcriptomes using a semi-supervised deep learning approach"
History
1.0.2 (2020-01-07)
- Set default values as paper.
1.0.1 (2020-01-06)
- Small bug fixes.
1.0.0 (2019-12-16)
- First release on PyPI.