scDenorm: a denormalization tool for single-cell transcriptomics data


Keywords
nbdev, jupyter, notebook, python, scrna-seq
License
Apache-2.0
Install
pip install scDenorm==0.0.6

Documentation

scDenorm

scDenorm is an algorithm that reverts normalised single-cell omics data to raw counts, preserving the integrity of the original measurements and ensuring consistent data processing during integration.

Install

pip install scDenorm

#or

conda install -c changebio scdenorm

Dependency

numpy
pandas
matplotlib
scanpy
anndata
scipy
tqdm
pathlib
fastcore
colorlog

How to use

Using pbmc3k as an example dataset

import scanpy as sc
from scipy.io import mmwrite
from scDenorm.denorm import *
DEBUG:my_logger:This is a debug message
INFO:my_logger:This is an info message
WARNING:my_logger:This is a warning message
ERROR:my_logger:This is an error message
CRITICAL:my_logger:This is a critical message
ad=sc.datasets.pbmc3k()
ad.layers['count']=ad.X.copy()
ad
AnnData object with n_obs Γ— n_vars = 2700 Γ— 32738
    var: 'gene_ids'
    layers: 'count'
sc.pp.normalize_total(ad, target_sum=1e4)
sc.pp.log1p(ad)
smtx = ad.X.tocsr().asfptype()
smtx.data
array([1.6352079, 1.6352079, 2.2258174, ..., 1.7980369, 1.7980369,
       2.779648 ], dtype=float32)
ad.write_h5ad('data/pbmc3k_norm.h5ad')

write out as sparse matrix

mmwrite('data/scaled.mtx', smtx[1:10,])

In jupyter

Input Anndata

scdenorm('data/pbmc3k_norm.h5ad',fout='data/pbmc3k_denorm.h5ad',verbose=1)
INFO:my_logger:Reading input file: data/pbmc3k_norm.h5ad
/home/huang_yin/anaconda3/envs/sc/lib/python3.9/site-packages/anndata/__init__.py:51: FutureWarning: `anndata.read` is deprecated, use `anndata.read_h5ad` instead. `ad.read` will be removed in mid 2024.
  warnings.warn(
INFO:my_logger:The dimensions of this data are (2700, 32738).
INFO:my_logger:Selecting base
INFO:my_logger:Denormlizing ...the base is 2.718281828459045

b is 2.718281828459045

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2700/2700 [00:02<00:00, 1071.27it/s]
INFO:my_logger:Writing output file: data/pbmc3k_denorm.h5ad

return a new anndata if there is no output path.

new_ad=scdenorm('data/pbmc3k_norm.h5ad')
new_ad
View of AnnData object with n_obs Γ— n_vars = 2700 Γ— 32738
    var: 'gene_ids'
    uns: 'log1p'
ad.layers['count'].data
array([1., 1., 2., ..., 1., 1., 3.], dtype=float32)
new_ad.X.data
array([1.       , 1.       , 2.0000002, ..., 1.       , 1.       ,
       3.       ], dtype=float32)

Input sparse matrix with cell by gene

If it is gene by cell, set gxc=True.

scdenorm('data/scaled.mtx',fout='data/scd_scaled.h5ad')
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:00<00:00, 2883.12it/s]

In command line

Input Anndata

!scdenorm data/pbmc3k_norm.h5ad --fout data/pbmc3k_denorm.h5ad
/home/huang_yin/anaconda3/envs/sc/lib/python3.9/site-packages/anndata/__init__.py:51: FutureWarning: `anndata.read` is deprecated, use `anndata.read_h5ad` instead. `ad.read` will be removed in mid 2024.
  warnings.warn(
b is 2.718281828459045
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2700/2700 [00:02<00:00, 1090.85it/s]

Input sparse matrix with cell by gene

!scdenorm data/scaled.mtx --fout data/scd_scaled_c.h5ad
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:00<00:00, 1333.31it/s]

or output mtx format.

!scdenorm data/scaled.mtx --fout data/scd_scaled_c.mtx
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:00<00:00, 1290.78it/s]

Citation

Yin Huang, Anna Vathrakokili Pournara, Ying Ao, Lirong Yang, Hui Zhang, Yongjian Zhang, Sheng Liu, Alvis Brazma, Irene Papatheodorou, Xinlu Yang, Ming Shi, Zhichao Miao β€œscDenorm: a denormalisation tool for integrating single-cell transcriptomics data”(Under review)