Single-cell trajectory integration

pip install trajectorama==0.2



Trajectorama is an algorithm that implements coexpression-based integration of multi-study single-cell trajectories. Trajectorama is described in the paper "Coexpression enables multi-study cellular trajectories of development and disease" by Brian Hie, Hyunghoon Cho, Bryan Bryson, and Bonnie Berger.


The most import dependency is on a custom implementation of Louvain clustering, which can be installed with the below commands:

# Binary dependency (for Ubuntu/Debian).
sudo apt-get install bison flex

git clone
cd louvain-igraph
python install

Installing Trajectorama can then be done by:

python -m pip install trajectorama

API and example usage

We provide a basic API around the core algorithm that takes an expression matrix augmented with study information and returns a list of coexpression matrices, with corresponding indices into the original data:

import trajectorama

X = [ ... ] # Sample-by-gene expression matrix.
studies = [ ... ] # Study identifiers, one for each row of `X`.

Xs_coexpr, sample_idxs = trajectorama.transform(
    X, studies,

The coexpression matrix Xs_coexpr[i] is defined over the subset of cells X[sample_idxs[i], :]. See the documentation string under the transform() function at the top of trajectorama/ for the full list of parameters and default values.

This list of coexpression matrices can then be used in further analysis, e.g., you can flatten the matrices and use Scanpy to visualize the matrices as a KNN graph based on distance in coexpression space:

from anndata import AnnData
import numpy as np
import scanpy as sc
from scipy.sparse import csr_matrix

# Save upper triangle and flatten.
n_features = X.shape[1]
triu_idx = np.triu_indices(n_features) # Indices of upper triangle.
X_coexpr = np.concatenate([
    X_coexpr_i[triu_idx].flatten() for X_coexpr_i in X_coexprs
X_coexpr = csr_matrix(X_coexpr)

# Plot KNN graph in coexpression space.
adata = AnnData(X_coexpr)

The example scripts below show more detailed usage of Trajectorama, which was used to generate the paper results.


Trajectorama for mouse neuronal development

Trajectorama analyzes five large-scale studies of mouse neurons over multiple points in development.

Data can be found at and can be downloaded as:

tar xvf data.tar.gz

To preprocess the data, run the command:

python bin/ conf/mouse_develop.txt

This preprocessing step only needs to be done once. Then, we perform panclustering and coexpression matrix computation using the command:

python bin/ > mouse_develop.log

This will save each coexpression matrix as a .npz file to a directory under target/sparse_correlations/. Computing all coexpression matrices should complete in around an hour when running on a single core.

The downstream analysis can then be performed on these cached matrices using the commands:

python bin/ >> mouse_develop.log
python bin/ >> mouse_develop.log

This will log some relevant statistics and save visualizations under the figures/ directory.

Trajectorama for human hematopoiesis

We can perform a similar workflow for human hematopoiesis by running the commands:

# Download (if not done so for mouse data).
tar xvf data.tar.gz

# Preprocess.
python bin/ conf/hematopoiesis.txt

# Analyze.
python bin/ > hematopoiesis.log
python bin/ >> hematopoiesis.log
python bin/ >> hematopoiesis.log

Trajectorama for microglia

We can perform a similar workflow for mouse and human microglia in various conditions by running the commands:

# Download (if not done so for mouse data).
tar xvf data.tar.gz

# Preprocess.
python bin/ conf/microglia.txt

# Analyze.
python bin/ > microglia.log
python bin/ >> microglia.log


Create an issue in the repository or contact for any pertinent questions or concerns. We will do our best to answer promptly and feel free to create a pull request and contribute!