RBCDE is a Python implementation of the rank-biserial correlation coefficient (Cureton, 1956), which can be used as an effect size equivalent of the Wilcoxon test (Kerby, 2014), which in turn was deemed to perform well on single cell data problems (Soneson, 2018). Using effect size analyses is recommended for problems with large population sizes (Sullivan, 2012). The package comes with both a scanpy-compatible version and a standalone function that ingests a data matrix and an assignment vector.
RBCDE depends on numpy, scipy and pandas. The package is available on pip, and can be easily installed as follows:
pip3 install rbcde
Usage and Documentation
RBCDE can slot into a scanpy workflow and accept an object with
log(CPM/100 + 1) data stored as a layer or
.raw, and the desired clustering/grouping vector as an
import rbcde rbcde.RBC(adata) degs, plot_dict = rbcde.filter_markers(adata)
clus_key argument controls which
.obs column is used for the grouping, and a combination of
use_raw can instruct the function to retrieve expression data from
rbcde.filter_markers() takes the computed coefficient values and thresholds them into a list of per-cluster markers. The thresholding can be controlled via the
thresh argument, with a range of literature critical values available. A helper dictionary, compatible with the formatting scanpy plotting functions accept in the
Analogous functions exist for scanpy-independent data analysis, and can ingest any data matrix with variables as rows and observations as columns. The filtering function does not produce a helper dictionary, only yielding the marker data frame.
results = rbcde.matrix.RBC(data, clusters, vars) degs = rbcde.matrix.filter_markers(results)
An HTML render of the RBCDE function docstrings, detailing all the parameters, can be accessed at ReadTheDocs.
rbc_demo.ipynb computes the rank-biserial correlation coefficient for demonstration 10X PBMC data, yielding a similar standard of markers to established approaches while reporting only ~13% of the gene total. This more compact summary does not require any heuristic filtering to obtain. The full marker export yielded by the analysis can be found at