
Stable, fast hash of collection of content of files and directories, optionally including permissions, dates, etc.

hash, compare, file, directory, tar, sha, md5, tarball, archive, zip, gzip
pip install file-collection-hash==1.0.0


file-collection-hash: Generate stable hash of a directory or

License: MIT Latest release

A Python commandline tool and callable function that can efficiently compute a repeatable hash string for the content of a directory or a collection of files.

Table of contents


Python package file-collection-hash provides a command-line tool as well as a runtime function to efficiently generate a stable content hash for a directory or collection of files. In general, a directory created with rsync -a old_dir/ new_dir/ will produce the same hash. The hash includes the data of all files, so it is reliable regardless of file timestamps, etc.

Files within a directory are processed in alhabetically sorted order, so that hashes remain stable across directory reconstruction.

Relative pathnames are included in the path, so that if a file is renamed, the hash will change.

By default, file modify timestamps, file owner/UID, and file group/GID are ignored for the purposes of hashing, so that directories cloned onto different systems will hash the same even if a different user owns the directory or UID/GID mappings are different. Options are provided to enabled includion of these properties in the hash.

By default, file permission/mode bits (e.g., Read, Write, Execute) are included in the hash; this allows applications to recognize chmod operations as significant and requiring update.

In general, the default options produce a hash that changes under similar conditions to when git status would show a change.

The hashing function can be any filter command that takes a byte stream as input and produces a whitespace-free textual hash as output. Any output from the first whitespace on is stripped.

file-collection-hash delegates all of the heavy lifting to two very optimized native external commands, piped together:

  1. tar is used to render all included files and directories into a repeatable byte stream. Command options on tar are used to sort the input files and to hide variations in owner, group, modify timestamps, and permission bits as required. The output of tar is piped directly into the hashing filter.
  2. The hashing filter command (by default sha256sum) has its stdin piped directly from the tar output.

This package was originally developed as part of a solution to update .tar.gz files, triggering dependent actions, only when there is a material change in the content being bundled, ignoring differences in timestamp and file owner/group settings.



Python: Python 3.7+ is required. See your OS documentation for instructions.

From PyPi

The current released version of file-collection-hash can be installed with

pip3 install pulumi-crypto

From GitHub

Poetry is required; it can be installed with:

curl -sSL | python3 -

Clone the repository and install pulumi-crypto into a private virtualenv with:

cd <parent-folder>
git clone
cd file-collection-hash
poetry install

You can then launch a bash shell with the virtualenv activated using:

poetry shell


Command Line

Example usage:

$ file-collection-hash --exclude=.git --exclude=.venv
$ file-collection-hash -C scripts
$ file-collection-hash -C scripts --no-ignore-owner --no-ignore-group
$ cd scripts
$ file-collection-hash


#!/usr/bin/env python3

import os
from file_collection_hash import file_collection_hash

print(file_collection_hash(exclude=['.git', '.venv']))
print(file_collection_hash('scripts', ignore_owner=False, ignore_group=False))

Known issues and limitations

  • TBD.

Getting help

Please report any problems/issues here.


Pull requests welcome.


pulumi-crypto is distributed under the terms of the MIT License. The license applies to this file and other files in the GitHub repository hosting this file.

Authors and history

The author of file-collection-hash is Sam McKelvie.