Jum
"jum" means "remember" in Thai
An alternative to Joblib's Memory to cache python function in-file
It uses dill package to pickle objects and also to help hashing function arguments, so it supports any kind of objects as long as dill supports it.
Use cases
import jum
@jum.cache(cache_dir='.jum')
def a_long_running_function(array):
... do some cpu intensive things ...
return value
import numpy as np
a_long_running_fn(<some_large_np_array>)
## to configure compression level (default 2)
@jum.cache(cache_dir='.jum', compresslevel=<0-9>)
Installation
pip install jum
Features
- It supports almost any kind of objects including numpy's ndarray which is its main use case.
- Faster and lighter and smaller cache footprints than Joblib's Memory.
- It supports file compression using Python's Gzip library.
- It uses SHA1 as the main hashing algorithm, to provide the large 256 bit hashing space.
- It now uses xxhash to hash the ndarray (specifically) for speed boost.
To be improved
- use dill to hash the function body instead of the function code, because some function's code cannot be retrieved, esp. in the case of python console.
- function file path might not work in case of python console, put some default values for it.
- using some faster hash, xxhash, (update) I have profiled it, found that the slowest, bottleneck, is rather the "pickle" process not hash itself.
- favor the slower hash (very negligible) to the safer for collisions.
- by directing hash the ndarray via xxhash, ndarray hashing performance is increased ten-fold.
- add a verbose mode, showing the time elapsed for hashing (mainly the overhead of caching).
-
add support to
F_CONTIGUOUS
nd-array by transposing it we can use xxhash to hash. - Take function dependencies (i.e. functions that this function calls) into account.
Known Problem
- null arg problem where a function as no argument.
-
using
dill
for hashing the function is an overkill, it's far too sensitive, I will fallback to function source lines. -
ValueError: ndarray is not C-contiguous
happens with some specific ndarray, not all ndarrays can be fed to xxhash directly: be treated by pickle for now.