h5max
handles storing and loading of scipy.sparse
data structures in h5py
file objects, which is not natively supported. It assumes a simple data structure where information of individual samples are stored according to the index they occupy within datasets.
pip install h5max
import h5py
import h5max
import numpy as np
fh = h5py.File('my_data.h5', 'w')
a = np.zeros((100,100))
b = np.zeros((1000,50))
a[7,1] = 1
b[1,0] = 10
m_list = [a, b]
# store both a, b
h5max.store_sparse(fh, m_list, format='csr')
# load only a (index 0)
a_out = h5max.load_sparse(fh, 0, format='csr')
# load [a,b]
m_list_out = h5max.load_sparse(fh, [0, 1], format='csr', to_numpy=True)
# load all idxs in the data
m_list_out = h5max.load_sparse(fh, format='csr')
fh.close()
-
Support for
csr
,csc
,coo
sparse types -
Support for
bsr
,dia
,dok
,lil
sparse types - Support for overwriting
- Flexible data loading and saving (both as sparse and numpy arrays.)
- Automatic format detection