★ This repository is under construction, and not yet ready for public use. Be patient.
.loom is an efficient file format for very large omics datasets,
consisting of a main matrix and a variable number of row and column
annotations. We use loom files to store single-cell gene expression
data: the main matrix contains the actual expression values (one
column per cell, one row per gene); row and column annotations
contain metadata for genes and cells, such as
Position (for genes), and
Age (for cells).
Loom files (
.loom) are created in the HDF5 file format, which
supports an internal collection of numerical multidimensional datasets.
HDF5 is supported by many computer languages, including Java, MATLAB,
Mathematica, Python, R, and Julia.
.loom files are accessible from
any language that supports HDF5.
Use pip from your terminal:
pip install loompy
Note: there are some prerequisites, which will be installed along with loompy. If you use the popular Anaconda Python distribution, all prerequisites will have already been installed.
import loom ds = loom.connect("cortex.loom") print ds.row_attrs.keys()
This will print the names of all the row attribute in the file.
Understanding the semantics of loom files
Connecting, not loading and saving
Loom files are stored on disk and are never loaded entirely. They are more like databases: you connect, retrieve some subset of the data, maybe update some attributes.
Reading and writing
Loom files are based on HDF5, a file format suitable for large multidimensional datasets. They are designed to be mostly created once, then used as read-only. They do not support writing and reading concurrently. They also do no support journalling, so if something happens during a write, the entire file can be lost. Therefore, do not use loom files as your primary data storage. They are for working with data, not keeping it safe.
Loom files are great for distribution of large datasets, which are then used as read-only for analytical purposes.
The main matrix is stored in chunked format. That is, instead of being stored by rows or by columns, it is stored as a sequence of little rectangles. As a consequence, both rows and columns (as well as submatrices) can be efficiently accessed.
The loompy API is documented at loompy.readthedocs.org.