cassiopeia-lineage

Single Cell Lineage Reconstruction with Cas9-Enabled Lineage Recorders


Keywords
scLT, computational-biology, computational-phylogenetics, crispr-cas9, lineage-tracing, single-cell-lineage-tracing, single-cell-rna-seq
License
MIT
Install
pip install cassiopeia-lineage==1.0.4

Documentation

Updates (April 16, 2020)

We now introduce FitchCount algorithm in Cassiopeia's Analysis module. Briefly, FitchCount is an efficient algorithm for aggreagating the number of state transitions across all optimal evolutionary histories (under the maximum parsimony criterion) given the states of the leaves are known. It builds on the Fitch-Hartigan algorithm for ancestral state assignment (i.e. the Small Parsimony Problem; Fitch 1971 & Hartigan 1973).

You can access the algorithm in cassiopeia.Analysis.reconstruct_states module with the fitch_count function. The function takes in a Networkx tree with a Pandas series mapping each leaf to a given state and returns a square count matrix M which summarizes the number of times a state flipped to any other state across all optimal solutions to the small parsimony problem as given by the Fitch-Hartigan algorithm.

You can invoke the algorithm as such:

from cassiopeia.Analysis.reconstruct_states import fitch_count

# tree is a networkx object over Cassiopeia Nodes
M = reconstruct_states.fitch_count(tree, meta['tissue_sample')

We are in the process of putting together a notebook tutorial, so stay tuned!

Updates (Feb. 9, 2020)

We have some updated features in our most current release:

  • LCA-based Hybrid Switching: we've found mixed results in using cell-number-based cutoffs in Cassiopeia-Hybrid and have thus started using the distance to the latest-common-ancestor (LCA) of a given group of cells as a determining factor for transitioning between Greedy and ILP. We recommend using values between 10 and 20. You can control this parameter with the hybrid_lca_mode, which will interpret the cutoff parameter as an LCA distance.
  • Additional approaches for missing data handling: in our Cassiopeia-Greedy approach (which Hybrid also uses), we now support different modes for missing data handling: (1) we've added a K-nearest-neighbor approach which classifies cells with missing data based on where it's K-closest 'friends' were assigne; and (2) a lookahead approach where we use future Greedy splits to assign cells with missing data. You can specify which mode you'd like to use with the greedy_missing_data_mode which can either be knn, avg, or lookahead.

As a reminder, you can look at all parameters that reconstruct-lineage and stress-test allow by using the -h flag.

Cassiopeia

This is a software suite for proecessing data from single cell lineage tracing experiments. This suite comes equipped with three main modules:

  • Target Site Sequencing Pipeline: a pipeline for extracing lineage information from raw fastqs produced from a lineage tracing experiment.
  • Phylogeny Reconstruction: a collection of tools for constructing phylogenies. We support 5 algorithms currently: a greedy algorithm based on multi-state compatibility, an exact Steiner-Tree solver, Cassiopeia (the combination of these two), Neighbor-Joining, and Camin-Sokal Maximum Parsimony.
  • Benchmarking: a set of tools for benchmarking; a simulation framework and tree comparsion tools.

You can find all documentation here

You can also find example notebooks in this repository:

Free Software: MIT License

Installation

  1. Clone the package as so: git clone https://github.com/YosefLab/Cassiopeia.git

  2. Ensure that you have python3.6 installed. You can install this via pip.

  3. Make sure that Gurobi is installed. You can follow the instructions listed here. To verify that it's working correctly, use the following tests:

    • Run the command gurobi.sh from a terminal window
    • From the Gurobi installation directory (where there is a setup.py file), use python setup.py install --user
  4. Make sure that Emboss is properly configurd and installed; oftentimes users may see a "command not found" error when attempting to align with the align_sequences function we have provided. This is most likely due to the fact that you have not properly added the binary file to your path variable. For details on how to download, configure, and install the Emboss package, refer to this tutorial.

  5. One of Cassiopeia's dependencies, pysam, requires HTSLib to be installed. You can read about pysam's requirements here.

  6. Ensure the Cython is installed. You can do this via python3.6 pip install --user cython.

  7. While we get pip working, it's best to first clone the package and then follow these instructions:

    • python3.6 setup.py build
    • python3.6 setup.py build_ext --inplace
    • python3.6 -m pip install . --user

To verify that it installed correctly, try using the package in a python session: import cassiopeia. Then, to make sure that the command-line tools work, try reconstruct-lineage -h and confirm that you get the usage details.

Command Line Tools

In addition to allowing users to use Cassiopeia from a python session, we provide five unique command line tools for common pipeline procedures:

  • reconstruct-lineage: Reconstructs a lineage from a provided character matrix (consisting of cells x characters where each element is the observed state of that character in that cell).
  • post-process-tree: Post-process trees after reconstructing to assign sample identities back to leaves of the tree and removing any leaves that don't correspond to a sample in the character matrix.
  • stress-test: Conduct stress testing on a given simulated tree. Writes out a new tree file after inferring a tree from the unique leaves of the "true", simulated tree.
  • call-lineages: Perform lineage group calling from a molecule table.
  • filter-molecule-table: Perform molecule table filtering.

All usage details can be found by using the -h flag.