strumpf

Skymind test resource management for paunchy files


License
Apache-2.0
Install
pip install strumpf==0.3.4

Documentation

Strumpf - Skymind Test Resource Upload Management for Paunchy Files

Strumpf is a test resource management tool for very large files that fits into your git workflow. Instead having to rely on git LFS, Strumpf keeps references to the actual test files and downloads them on demand at test time. This way your test resource folder won't exceed size limits you want to avoid.


Build Status License PyPI version

Installation

Strumpf is on PyPI, so you can install it with pip:

pip install strumpf

Alternatively, you can build the project locally as follows:

git clone https://www.github.com/deeplearning4j/strumpf.git
cd strumpf
python setup.py install

Strumpf command line interface (CLI)

Installing Strumpf exposes a command line tool called strumpf. You can use this tool to configure your test environment. To initialize a new Strumpf configuration, type

Configuration

strumpf configure


███████╗████████╗██████╗ ██╗   ██╗███╗   ███╗██████╗ ███████╗
██╔════╝╚══██╔══╝██╔══██╗██║   ██║████╗ ████║██╔══██╗██╔════╝
███████╗   ██║   ██████╔╝██║   ██║██╔████╔██║██████╔╝█████╗  
╚════██║   ██║   ██╔══██╗██║   ██║██║╚██╔╝██║██╔═══╝ ██╔══╝  
███████║   ██║   ██║  ██║╚██████╔╝██║ ╚═╝ ██║██║     ██║
╚══════╝   ╚═╝   ╚═╝  ╚═╝ ╚═════╝ ╚═╝     ╚═╝╚═╝     ╚═╝

...

First, Strumpf will ask you what the project you want to track is called, where your local resources are stored (full path to folder), and Azure credentials for a blob storage account. After providing all information, you can change into your test resource directory and query its status using strumpf status, which will prompt git-like information:

  • it prompts which project you use (below "dl4j")
  • it tells you about your untracked large files
  • it shows you which large files have been modified
  • it tells you which large files are already staged for upload

An example output would look as follows:

strumpf status

>>> Working on project dl4j
 Changes to be uploaded:
 (use "strumpf reset <file>..." to unstage files)

        modified:    /home/max/code/strumpf-test-folder/cgoban.jar


 Untracked large files:
 (use "strumpf add <file>..." to include in what will be committed)

        /home/max/code/strumpf-test-folder/test.jar

Tracking large files

Next, to add files to strumpf tracking system you use strumpf add <file or path>, to track all large files recently added in your test folder you could for instance issue the command strumpf add .. You can either use absolute paths, or paths relative to the test folder you're working with.

To see the effect of adding files you can query strumpf status afterwards again to see that your previously untracked or modified files are now staged for upload by strumpf.

Uploading files

The final step is strumpf upload, which does several things for you:

  • Strumpf will compress your staged files using gzip.
  • It will then compute sha256 hashes for both original and compressed files.
  • The compressed files will be uploaded to Azure blob storage. The original files will be moved to a local caching folder.
  • After completion of the upload, all large files will be removed locally and only references to them will be kept, including the file hashes.
  • Your large files are now hosted externally on Azure and you can git commit the file references instead.

Downloading files

To download a file using the CLI, you use

strumpf download -f <relative-path-to-file>

This command will check if your cache already has the file you're looking for and check if its hash matches the one found in your local resource folder. If the hashes match, the files won't be downloaded again. If they do not match or if they are no cached files at all, the file will be downloaded.

Finally, to download all available remote resource files, use:

strumpf bulk_download

Changing projects

Should you have multiple projects you want to track with Strumpf, you can do that quite easily. For instance, if you want to add a new project, simply use strumpf configure again and use a different project name and configuration as required. You can check at any point which projects are currently tracked by Strumpf by typing

strumpf projects

To switch to another project, simply use

strumpf set_project <project>