epicascade

CASCADE: a scCAS cell type annotation method dedicated to differentiating and imbalanced types


Keywords
pip, CASCADE, single-sell
License
MIT
Install
pip install epicascade==0.0.2

Documentation

Accurate Annotation for Differentiating and Imbalanced Cell Types in Single-cell Chromatin Accessibility Data

Installation

install CASCADE from PYPI

pip install CASCADE

Install CASCADE from GitHub

git clone git://github.com/BioX-NKU/CASCADE.git
cd CASCADE
python setup.py install

The dependencies will be automatically installed along with CASCADE.

Quick Start

Input:

h5ad file Files from the training set scCAS data and files from the scCAS data that need to be annotated.

Output:

array Array object with containing cell type annotation results

Use processes:

First, in order to simulate the dataset, we need to split the training set into different cell types and generate the corresponding snap files.

import CASCADE
CASCADE.split.makesnap(train_path,work_path)

Then, we use the simATAC package in R to simulate the data and obtain h5ad file of the simulated data.

library(simATAC)
library(Matrix)
library(SingleCellExperiment)
library(zellkonverter)

simulation<-function(work_path){
    rate <- 6
    typelist = read.csv(paste(work_path,sprintf('typelist.csv',order),sep = ""))
    for (i in c(1:dim(typelist)[1])){
        goal_num = ceiling(colSums(typelist['cell_type'])/dim(typelist)[1]*rate)
        if(goal_num-typelist[i,2]<=5){
            n_cell <- 5
        }
        else{
            n_cell <- goal_num-typelist[i,2]
        }
        celltype <- typelist[i,1]
        print(celltype)
        print(n_cell)
        count <- getCountFromh5(paste(work_path,sprintf('%s.snap',order,celltype),sep = ""))
        object <- simATACEstimate(t(count))
        sim <- simATACSimulate(object, nCells=n_cell)
        print(sim)
        writeH5AD(sim, file=paste(work_path,sprintf('%s.h5ad',order,celltype),sep = ""))
    	}
    }

Finally, we combine the training set with the simulation set, via the run function in CASCADE after data preprocessing together with the test set, and feed training set into model to train, and obtain the annotation results of the given test set.

annotation = runmodel(train_path,test_path,work_path)

The source datasets are available at here.