h5pickle

Wrap h5py objects to allow pickling


Keywords
h5py, hdf5, pickle, dask, python
License
MIT
Install
pip install h5pickle==0.4.2

Documentation

Pickle-compatible h5py wrapper

This module provides a wrapper for the h5py classes to allow pickling of h5py objects. Basically the arguments to the h5py.File call are saved, and a new file is opened when a Group, Dataset or File is unpickled. Ergo, this will only work well on shared filesystems, and for reading files (SWMR should be fine too).

Caching

A Least-Recently-Used (LRU) cache is used to keep h5pickle.File objects in based on the arguments passed to that function. On unpickling that cache is first checked to prevent us from opening the same file multiple times, and to make using the same file repeatedly faster.

Setup

First you need to install the PyPI or conda-forge package, or clone this repository in your path.

pip install h5pickle
conda config --add channels conda-forge
conda install h5pickle

Then you can use h5pickle as a drop-in replacement for h5py.

import h5pickle as h5py

Note that not all features of h5py are supported yet. Pull requests are very welcome. Specifically writing files is problematic, as to do this properly from multiple processes needs MPI support.

Usage

import pickle, h5pickle
f = h5pickle.File('filename.h5', 'r', skip_cache=False) # skip_cache = True by default
f2 = pickle.loads(pickle.dumps(f, protocol=pickle.HIGHEST_PROTOCOL))
f2 == f # True

g = pickle.loads(pickle.dumps(f['/group/'], protocol=pickle.HIGHEST_PROTOCOL)) # works
d = pickle.loads(pickle.dumps(f['/group/set'], protocol=pickle.HIGHEST_PROTOCOL)) # works

Be very careful using this with any file open flags other than 'r' in a parallel context

It is recommended to use at least protocol 2. Some features are known to work with lower protocols

Testing

A few tests are available in the tests/ folder. Run them with

pytest

References

Inspired by

License

All code is available under the MIT license