Work with Loom files for single-cell RNA-seq data


Keywords
loom, omics, transcriptomics, bioinformatics
License
BSD-3-Clause
Install
pip install loompy==3.0.7

Documentation

loompy

★ This repository is under construction, and not yet ready for public use. Be patient.

.loom is an efficient file format for very large omics datasets, consisting of a main matrix and a variable number of row and column annotations. We use loom files to store single-cell gene expression data: the main matrix contains the actual expression values (one column per cell, one row per gene); row and column annotations contain metadata for genes and cells, such as Name, Chromosome, Position (for genes), and Strain, Sex, Age (for cells).

Loom files (.loom) are created in the HDF5 file format, which supports an internal collection of numerical multidimensional datasets. HDF5 is supported by many computer languages, including Java, MATLAB, Mathematica, Python, R, and Julia. .loom files are accessible from any language that supports HDF5.

Installation

Use pip from your terminal:

pip install loompy

Note: there are some prerequisites, which will be installed along with loompy. If you use the popular Anaconda Python distribution, all prerequisites will have already been installed.

Getting started

import loom
ds = loom.connect("cortex.loom")
print ds.row_attrs.keys()

This will print the names of all the row attribute in the file.

Understanding the semantics of loom files

Connecting, not loading and saving

Loom files are stored on disk and are never loaded entirely. They are more like databases: you connect, retrieve some subset of the data, maybe update some attributes.

Reading and writing

Loom files are based on HDF5, a file format suitable for large multidimensional datasets. They are designed to be mostly created once, then used as read-only. They do not support writing and reading concurrently. They also do no support journalling, so if something happens during a write, the entire file can be lost. Therefore, do not use loom files as your primary data storage. They are for working with data, not keeping it safe.

Loom files are great for distribution of large datasets, which are then used as read-only for analytical purposes.

Efficient indexing

The main matrix is stored in chunked format. That is, instead of being stored by rows or by columns, it is stored as a sequence of little rectangles. As a consequence, both rows and columns (as well as submatrices) can be efficiently accessed.

Documentation

The loompy API is documented at loompy.readthedocs.org.