Tool for launching pipelines locally and remotely


Keywords
data-science, docker, hpc-applications, pipelines, python, singularity
License
MIT
Install
pip install clowdr==0.0.17

Documentation

Clowdr

Clowd is a command-line utility for iteratively developing pipelines, deploying them at scale, and sharing data and derivatives.

Build Status Coverage Status Documentation Status DOI

Contents

Overview

Clowdr can be thought of as a cloud execution utility for Boutiques, the JSON-based descriptive command-line framework. As Boutiques and the Boutiques tools allow the encapsulation, validation, evaluation, and deployment of command-line routines, Clowdr inherits and extends this functionality to remote datasets and computational resources.

Clowdr exposees several levels of evaluation: local, cluster, cloud, and share. The local runs tasks using the system scheduler, and paired with the -dev flag can enable the rapid prototyping of tools, descriptors, and invocations. The cluster mode generates the exact same executions as in local but submits them through a cluster's scheduler for parallel execution. Similarly, cloud runs the tasks on a remote cloud such as Amazon. Finally, the share mode launches a light-weight webserver, ultimately generating a static HTML page which can be stored and redistributed that documents provenance and run information for the launched tasks.

System Requirements

Clowdr requires Python3 and either Docker or Singularity. It has only been tested on Mac OSX and Linux, though no requirements are specific to these operating systems and suggest that it may also function properly on Windows.

Installation Instructions

Installation is quite simple - just run:

pip install clowdr

Docker

Clowdr is available on Docker Hub, and can be downloaded with:

docker pull clowdr/clowdr

Singularity

Clowdr is also available on Singularity Hub, and can be downloaded with:

singularity pull clowdr/clowdr

Usage

(For up to date command-lines please check out our documentation)

Below we'll explore each of the main three modes of operation for Clowdr. If in doubt, always feel free to turn back to the help-text:

clowdr -h

Local

From this directory, assuming the BIDS dataset ds114 is installed at /data/ds114, and your system has Docker installed, run:

clowdr local examples/descriptor_d.json examples/invocation.json examples/task/ /data/ds114/ -v /data/ds114/:/data/ds114 -b

What you just did was launched clowdr in local mode, with the tool examples/descriptor_d.json, invocation at examples/invocation.json, where outputs will be stored in the Clowdr directory, examples/task, from data stored at /data/ds114, and being mounted to that same container location, -v /data/ds114:/data/ds114, that happens to be organized according to the BIDS specification, -b. If you also wanted verbose output, -V, or to develop, -d, as well as some other options, there are flags that can be discovered with the help flag, -h.

If the data wasn't organized in BIDS format, we could provide a directory of invocations in place of examples/invocation.json, or of course a single invocation and omit the -b flag in both cases to run either a group of tasks or single task, respectively.

You can now look in the Clowdr directory to see the outputs of this pipeline.

Cluster

IF you want to scale up your analysis, you can then turn to the cluster mode. The arguments supplied are exactly the same, with some minor additions such as adding the hostname to your data location, specifying your cluster type, here slurm, and your account identifier, job names, etc.

clowdr cluster ./examples/descriptor_s.json ./examples/invocation.json ./examples/task/ server.hostname.ca:/path/to/data/ slurm -v /path/to/data/:/data/ --account my-account-id --jobname clowdr-taskname -b

The execution takes place here exactly like in the local mode, where here we specified a singularity version of the descriptor. Flags such as -d for development/single-execution mode also are consistent in this mode and helpful for prototyping analyses prior to large executions.

Cloud

Presuming you ran locally and were happy with the results, but have larger collections of data you'd like to process, and don't have access to a cluster, you can turn to the cloud. If you've uploaded the same dataset to Amazon Web Services S3 at s3://mybucket/ds114/, and have your credentials stored in this directory at credentials.csv, run:

clowdr cloud examples/descriptor_d.json examples/invocation.json s3://mybucket/clowdr/ s3://mybucket/ds114/ aws credentials.csv -bv -r us-east-1

Here, you also did the same as above, except in cloud mode, with remote data on S3, specifying the Amazon endpoint, aws, and setting your Amazon region to us-east-1.

Share

Once Clowdr tasks are launched, they will return a directory which will be home to the output task information - either on Amazon S3 or local, depending on the parameters provided. The share mode allows you to quickly inspect and explore the launched tasks, give updates on their status, and ultimately provides a static HTML page which can be downloaded and shared with the processed derivatives as provenance information about the execution. You can point the share service at either your Clowdr output directory, or in the case of an example packaged with the repository, the line below:

clowdr share ./examples/task/bids-example/clowdr/ -d

Documentation

For detailed and up-to-date documentation, check out our read-the-docs page, at clowdr.rtfd.io.

License

This project is covered under the MIT License.

Issues

If you're having trouble, notice a bug, or want to contribute (such as a fix to the bug you may have just found) feel free to open a git issue or pull request. Enjoy!