A content-addressable file management system.

hashfs, hash, file, system, content, addressable, fixed, storage
pip install hashfs==0.7.2



version travis coveralls license

HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file's hash.

Typical use cases for this kind of system are ones where:

  • Files are written once and never change (e.g. image storage).
  • It's desirable to have no duplicate files (e.g. user uploads).
  • File metadata is stored elsewhere (e.g. in a database).


  • Files are stored once and never duplicated.
  • Uses an efficient folder structure optimized for a large number of files. File paths are based on the content hash and are nested based on the first n number of characters.
  • Can save files from local file paths or readable objects (open file handlers, IO buffers, etc).
  • Able to repair the root folder by reindexing all files. Useful if the hashing algorithm or folder structure options change or to initialize existing files.
  • Supports any hashing algorithm available via hashlib.new.
  • Python 2.7+/3.3+ compatible.


Install using pip:

pip install hashfs


Designate a root folder for HashFS. If the folder doesn't already exist, it will be created.

NOTE: The algorithm value should be a valid string argument to hashlib.new().

Basic Usage

HashFS supports basic file storage, retrieval, and removal as well as some more advanced features like file repair.

Storing Content

Add content to the folder using either readable objects (e.g. StringIO) or file paths (e.g. 'a/path/to/some/file').

Retrieving File Address

Get a file's HashAddress by address ID or path. This address would be identical to the address returned by put().

Retrieving Content

Get a BufferedReader handler for an existing file by address ID or path.

NOTE: When getting a file that was saved with an extension, it's not necessary to supply the extension. Extensions are ignored when looking for a file based on the ID or path.

Removing Content

Delete a file by address ID or path.

NOTE: When a file is deleted, any parent directories above the file will also be deleted if they are empty directories.

Advanced Usage

Below are some of the more advanced features of HashFS.

Repairing Files

The HashFS files may not always be in sync with it's depth, width, or algorithm settings (e.g. if HashFS takes ownership of a directory that wasn't previously stored using content hashes or if the HashFS settings change). These files can be easily reindexed using repair().

WARNING: It's recommended that a backup of the directory be made before repairing just in case something goes wrong.

Walking Corrupted Files

Instead of actually repairing the files, you can iterate over them for custom processing.

WARNING: HashFS.corrupted() is a generator so be aware that modifying the file system while iterating could have unexpected results.

Walking All Files

Iterate over files.

Iterate over folders that contain files (i.e. ignore the nested subfolders that only contain folders).

Computing Size

Compute the size in bytes of all files in the root directory.

Count the total number of files.

For more details, please see the full documentation at http://hashfs.readthedocs.org.