Strumpf - Skymind Test Resource Upload Management for Paunchy Files
Strumpf is a test resource management tool for very large files that fits into your git workflow. Instead having to rely on git LFS, Strumpf keeps references to the actual test files and downloads them on demand at test time. This way your test resource folder won't exceed size limits you want to avoid.
Strumpf is on PyPI, so you can install it with
pip install strumpf
Alternatively, you can build the project locally as follows:
git clone https://www.github.com/deeplearning4j/strumpf.git cd strumpf python setup.py install
Strumpf command line interface (CLI)
Installing Strumpf exposes a command line tool called
strumpf. You can use this tool to configure
your test environment. To initialize a new Strumpf configuration, type
strumpf configure ███████╗████████╗██████╗ ██╗ ██╗███╗ ███╗██████╗ ███████╗ ██╔════╝╚══██╔══╝██╔══██╗██║ ██║████╗ ████║██╔══██╗██╔════╝ ███████╗ ██║ ██████╔╝██║ ██║██╔████╔██║██████╔╝█████╗ ╚════██║ ██║ ██╔══██╗██║ ██║██║╚██╔╝██║██╔═══╝ ██╔══╝ ███████║ ██║ ██║ ██║╚██████╔╝██║ ╚═╝ ██║██║ ██║ ╚══════╝ ╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ...
First, Strumpf will ask you what the project you want to track is called, where your local resources are stored (full path to folder), and Azure credentials for a blob storage account. After providing all information, you can change into your
test resource directory and query its status using
strumpf status, which will prompt git-like information:
- it prompts which project you use (below "dl4j")
- it tells you about your untracked large files
- it shows you which large files have been modified
- it tells you which large files are already staged for upload
An example output would look as follows:
strumpf status >>> Working on project dl4j Changes to be uploaded: (use "strumpf reset <file>..." to unstage files) modified: /home/max/code/strumpf-test-folder/cgoban.jar Untracked large files: (use "strumpf add <file>..." to include in what will be committed) /home/max/code/strumpf-test-folder/test.jar
Tracking large files
Next, to add files to strumpf tracking system you use
strumpf add <file or path>, to track all
large files recently added in your test folder you could for instance issue the command
strumpf add .. You can either use absolute paths, or paths relative to the test folder you're working with.
To see the effect of adding files you can query
strumpf status afterwards again to see that your previously untracked or modified files are now staged for upload by strumpf.
The final step is
strumpf upload, which does several things for you:
- Strumpf will compress your staged files using
- It will then compute
sha256hashes for both original and compressed files.
- The compressed files will be uploaded to Azure blob storage. The original files will be moved to a local caching folder.
- After completion of the upload, all large files will be removed locally and only references to them will be kept, including the file hashes.
- Your large files are now hosted externally on Azure and you can git commit the file references instead.
To download a file using the CLI, you use
strumpf download -f <relative-path-to-file>
This command will check if your cache already has the file you're looking for and check if its hash matches the one found in your local resource folder. If the hashes match, the files won't be downloaded again. If they do not match or if they are no cached files at all, the file will be downloaded.
Finally, to download all available remote resource files, use:
Should you have multiple projects you want to track with Strumpf, you
can do that quite easily. For instance, if you want to add a new project, simply use
strumpf configure again and use a different project name and configuration as required. You can check at any point which projects are currently tracked by Strumpf by typing
To switch to another project, simply use
strumpf set_project <project>