Authors: Ted Verhey, Sorana Morrissy
Contributors: Hyojin Song, Aaron Gillmor, Gurveer Gill, Courtney Hall
mosaicMPI is a Python package for enabling mosaic integration of bulk, single-cell, and spatial expression data through program-level integration. Programs are first discovered using unsupervised deconvolution (consensus non-negative matrix factorization, cNMF) across multiple ranks separately for each dataset. A flexible network-based approach groups similar programs together across resolutions and datasets. Program communities are then interpreted using sample/cell metadata and gene set analyses. Integrative program communities enable metadata transfer across datasets.
Here are just a few of the things that mosaicMPI does well:
- Identifies interpretable, non-negative programs at multiple resolutions
- Mosaic integration does not require subsetting features/genes to a shared or overdispersed subset
- Multi-omics integration without shared sample IDs
- Incremental integration (adding datasets one at a time) since deconvolution is performed independently on each dataset
- High performance integration of of datasets with mismatched features (eg. Microarray, RNA-Seq, Proteomics) or sparsity (eg. single-cell vs. bulk)
- Metadata transfer across datasets
mosaicMPI is usable via:
- command-line interface for rapid data exploration and integration
- python interface for extensibility and flexibility
- Compatible with OS X, Windows and Linux systems
- Memory usage depends on size and number of datasets
Install the package with conda
:
# create an environment called mosaic and install
conda create -n mosaic -c conda-forge mosaicmpi
conda activate mosaic
For ssGSEA analysis, you will also need to install GSEApy into the same environment.
# if you have conda (MacOS_x86-64 and Linux only)
conda install -c bioconda gseapy
# Windows and MacOS_ARM64 (M1/2-Chip)
pip install gseapy
Read the documentation.
For questions arising during use of mosaicMPI, create and browse issues in the GitHub "issues" tab.