Architect jobs for running analyses
- Documentation: http://jobarchitect.readthedocs.io
- GitHub: https://github.com/JIC-CSB/jobarchitect
- PyPI: https://pypi.python.org/pypi/jobarchitect
- Free software: MIT License
Overview
This tool is intended to automate generation of scripts to run analysis on data sets. To use it, you will need a data set that has been created (or annotated) with dtool. It aims to help by:
- Removing the need to know where specific data items are stored in a data set
- Providing a means to split an analyses into several chunks (file based parallelization)
- Providing a framework for seamlessly running an analyses inside a container
Design
This project has two main components. The first is a command line tool named
sketchjob
intended to be used by the end user. It is used to generate
scripts defining jobs to be run. The second (_analyse_by_ids
) is a command
line tool that is used by the scripts generated by sketchjob
. The end user
is not meant to make use of this second script directly.
Installation
To install the jobarchitect package.
$ cd jobarchitect $ python setup.py install
Use
The jobarchitect
tool only works with "smart" tools.
A "smart" tool is a tool that understands dtoolcore
datasets, has no positional command line arguments and supports the
named arguments --dataset-path
, --identifier
, --output-directory
.
The tool should only process the dataset item specified by the identifier
and write all output to the specified output directory.
A dtool dataset can be created using dtool. Below is some sample:
$ dtool new dataset project_name [project_name]: dataset_name [dataset_name]: example_dataset ... $ echo "My example data" > example_dataset/data/my_file.txt $ datatool manifest update example_dataset/
Create an output directory:
$ mkdir output
Then you can generate analysis run scripts with:
$ sketchjob my_smart_tool.py exmaple_dataset output/ #!/bin/bash _analyse_by_ids \ --tool_path=my_smart_tool.py \ --input_dataset_path=example_dataset/ \ --output_root=output/ \ 290d3f1a902c452ce1c184ed793b1d6b83b59164
Try the script with:
$ sketchjob my_smart_tool.py exmaple_dataset output/ > run.sh $ bash run.sh $ cat output/first_image.png 290d3f1a902c452ce1c184ed793b1d6b83b59164 /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png
Working with Docker
Building a Docker image
For the tests to pass, you will need to build an example Docker image, which you do with the provided script:
$ bash build_docker_image.sh
Running code with the Docker backend
By inspecting the script and associcated Docker file, you can get an idea of how to build Docker images that can be used with the jobarchitect Docker backend, e.g:
$ sketchjob scripts/my_smart_tool.py ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect #!/bin/bash IMAGE_NAME=jicscicomp/jobarchitect docker run \ --rm \ -v /Users/olssont/junk/cotyledon_images:/input_dataset:ro \ -v /Users/olssont/junk/output:/output \ -v /Users/olssont/sandbox/scripts:/scripts:ro \ $IMAGE_NAME \ _analyse_by_ids \ --tool_path=/scripts/my_smart_tool.py \ --input_dataset_path=/input_dataset \ --output_root=/output \ 290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4