WIP: xrcache | on-disk cache for numerical functions working with xarray
Disclaimer
This is work in progress, things change fast.
What is this?
xrcache provides on-disk caching for functions operating on xarray.DataArray
or xarray.Dataset
, see xarray documentation.
How does it work?
xrcache provides a decorator that turns a function into a cached function, which means it saves its result to the cache
folder as xarray.DataArray
or xarray.Dataset
and automatically re-uses this result if it receives the same input again.
Hash Philosophy
Each Dataset
is assigned a hash
for identification. The initial hash for the data has to be provided by the user (see example below). All subsequent hashes are generated deterministically from the hash_input
, i.e., the hash
of the input data, and hash_function
, a hash generated from the signature of the function that is called.
Who needs this?
People who like to work with xarray.DataArray
and xarray.Dataset
and like to store intermediate results of their workflows locally, and speedup recalculation of intermediate steps.
Example
import xarray as xr
import xrcache as xc
# create a generic named dataset and assign a hash to its `attrs`
ds = xr.Dataset({"bar": ("x", [1, 2, 3, 4]), "x": list("abcd")})
ds.attrs.update({"name": "dataset"}) # <- name your data!
ds.attrs.update({xc.keys.hash: "some_hash"}) # <- important
# define some function that works on the dataset, e.g.
def function(dataset: xr.Dataset) -> xr.Dataset:
"""add square of input"""
ds = dataset.copy()
result = dataset.bar ** 2
new_ds = ds.update({"foo": result})
return new_ds
# make it a cached function with xrcache
@xc.cached
def cached_function(dataset):
return function(dataset)
Check out the notebooks in /doc
.
Changelog
v0.0.4: Allow func(..., cache=False, hash='some_other_hash')
for more fine grained control. Add 3 digits of the hash to the cache file to avoid clashes.
v0.0.3: Add array/dataset names to stored files, introduce @stored
, simplify attached data, allow @cached(verbose=True)
v0.0.2: include project description on pypi
v0.0.1: initial commit