btable-py

A binary serialization format for sparse, labeled 2D numeric datasets


License
Other
Install
pip install btable-py==0.1.0

Documentation

btable-py

A Python interface for the BTable serialization format, providing fast, compact binary serialization for large, sparse, labeled 2D numeric datasets ('binary tables').

A BTable is basically a binary representation of a sparse matrix on disk, and the format is inspired by the Compressed Row Storage (CRS) format, saving space by only storing the indices/values of nonzero cells. It is designed in a strictly row-oriented format for efficient iteration, and is not a library for matrix computation or linear algebra.

Note that BTables are not a drop-in replacement for all datasets stored as e.g. CSV: the increases in efficiency is proportional to the sparsity of the dataset. For a pathological fully-nonzero dataset, the space occupied can be much larger than a CSV!

Examples

import btable

# Writing a table
labels = ["login", "view_item", "purchase"]
rows = [[5.0,3.0,1.0], [2.0,0.0,0.0], [0.0,0.0,0.0]]
btable.write("/path/to/my_table.btable", labels, rows)

# Reading a table
bt = btable.BTable("/path/to/my_table.btable")

print(bt.labels)

for row in bt.rows():
  # Process individual row...
  print(row[0:])