A Python interface for the BTable serialization format, providing fast, compact binary serialization for large, sparse, labeled 2D numeric datasets ('binary tables').
A BTable is basically a binary representation of a sparse matrix on disk, and the format is inspired by the Compressed Row Storage (CRS) format, saving space by only storing the indices/values of nonzero cells. It is designed in a strictly row-oriented format for efficient iteration, and is not a library for matrix computation or linear algebra.
Note that BTables are not a drop-in replacement for all datasets stored as e.g. CSV: the increases in efficiency is proportional to the sparsity of the dataset. For a pathological fully-nonzero dataset, the space occupied can be much larger than a CSV!
import btable # Writing a table labels = ["login", "view_item", "purchase"] rows = [[5.0,3.0,1.0], [2.0,0.0,0.0], [0.0,0.0,0.0]] btable.write("/path/to/my_table.btable", labels, rows) # Reading a table bt = btable.BTable("/path/to/my_table.btable") print(bt.labels) for row in bt.rows(): # Process individual row... print(row[0:])