Deploy PlantIT workflows on laptops, servers, or clusters.


Keywords
hpc, phenomics, python, singularity, slurm
License
BSD-3-Clause
Install
pip install plantit-cli==0.3.31

Documentation

plantitCLI

CLI, Python library and science gateway for high-throughput phenotyping on clusters and the CyVerse cloud

  • generate job scripts and launch containers to a SLURM cluster with a single command
  • add the plantit-action to a GitHub Actions workflow for continuous analysis
  • automatically transfer data and results to and from the CyVerse Data Store
  • discover or publish workflows and monitor your submissions in the web UI

CI Coverage Status Status

In development, not stable.

Contents

Requirements

  • Python 3.8+

Installation

To install with pip:

pip install plantit

Quickstart

The plantit CLI parses YAML configuration files, generates job scripts, and submits Singularity container workflows to a SLURM scheduler in the local environment or to a remote cluster via SSH.

At minimum, a configuration file must contain:

  • image
  • entrypoint
  • workdir
  • email
  • queue

For instance, in hello.yaml:

image: alpine
entrypoint: echo "hello world"
workdir: /path/to/your/scratch/directory
email: you@institution.edu
queue: batch

To generate a job script and submit this job to a scheduler running on the host machine:

plantit hello.yaml

If the job was submitted successfully, the job ID will be printed.

Usage

To show CLI docs run plantit -h. Besides the main plantit <config file>.yml command, a number of subcommands can be invoked with plantit <command>.

Commands

The following commands are available:

  • token: Retrieve a CyVerse authentication token.
  • compat: Check if the current system is compatible.
  • user: Retrieve the user's profile information.
  • list: List files in a collection.
  • stat: Get information about a file or collection.
  • pull: Download one or more files from a collection.
  • push: Upload one or more files to a collection.
  • exists: Check if a path exists in the data store.
  • create: Create a collection in the data store.
  • share: Share a file or collection with another user.
  • unshare: Revoke another user's access to your file or collection.
  • tag: Set metadata for a given file or collection.
  • tags: Get metadata for a given file or collection.
  • scripts: Generate job scripts for a container workflow.
  • submit: Submit jobs for a container workflow to a cluster.

Token

To request a CyVerse CAS authentication token, use the token command:

plantit token --username <your CyVerse username> --password <your CyVerse password>

The token can be passed via --token (-t) argument to authenticate subsequent commands.

Compat

The plantit compat command determines whether jobs can be submitted to the host system, affirming to stdout if the following conditions are met:

  • singularity is installed and available on the path
  • the CyVerse data store is reachable via iRODS or science APIs
    • for the former, the user must have run iinit to configure iCommands
  • SLURM is up and standard commands sbatch, squeue, sacct, etc are available

Otherwise the command terminates with an error signal and information on the missing or misconfigured dependencies is printed to stderr.

User

The user command can be used to retrieve public profile information for CyVerse users. For instance, to get my profile info:

plantit user -t <token> wbonelli

List

To list the contents of a collection in the data store, use the list command. For instance:

plantit list -t <token> /iplant/home/shared/iplantcollaborative/testing_tools/

Stat

To view metadata for a particular collection or object in the data store, use the stat command. For instance:

plantit stat -t <token> /iplant/home/shared/iplantcollaborative/testing_tools/

Pull

To download a single file from the data store to the current working directory, simply provide its full path:

plantit pull -t <token> /iplant/home/shared/iplantcollaborative/testing_tools/cowsay/cowsay.txt

To download all files from the /iplant/home/shared/iplantcollaborative/testing_tools/cowsay/ collection to the current working directory, just provide the collection path instead:

plantit pull -t <token> /iplant/home/shared/iplantcollaborative/testing_tools/cowsay/

Optional arguments are:

  • --local_path (-p): Local path to download files to
  • --include_pattern (-ip): File patterns to include (0+)
  • --force (-f): Whether to overwrite already-existing files

Push

To upload all files in the current working directory to the /iplant/home/<my>/<directory/ in the CyVerse Data Store, use:

plantit push -t <token> /iplant/home/<username>/<collection>/

Optional arguments include:

  • --local_path (-p): Local path to download files to
  • --include_pattern (-ip): File patterns to include (0+)
  • --include_name (-in): File names to include (0+)
  • --exclude_pattern (-ep): File patterns to exclude (0+)
  • --exclude_name (-en): File names to exclude (0+)

To upload a single file to the data store, provide the --local_path (-p) argument. For instance:

plantit push -t <token> /iplant/home/<username>/<collection/ -p /my/local/file.txt

If only include_...s are provided, only the file patterns and names specified will be included. If only exclude_...s section are present, all files except the patterns and names specified will be included. If you provide both include_... and exclude_... sections, the include_... rules will first be applied to generate a subset of files, which will then be filtered by the exclude_... rules.

Exists

To determine whether a particular path exists in the data store, use the exists command. For instance, to check if a collection exists:

plantit exists -t <token> /iplant/home/<username>/<collection

The --type option can be provided with value dir or file to verify that the given path is of the specified type.

Create

To create a new collection, use the create command:

plantit create -t <token> /iplant/home/<username>/<new collection name>

Share

To share a file or collection with another user, use the share command:

plantit share -t <token> /iplant/home/<username>/<collection> --username <user to share with> --permission <'read' or 'write'>

Note that you must provide both the --username and --permission flags.

Unshare

To revoke another user's access to your file or collection, use the unshare command:

plantit unshare -t <token> /iplant/home/<username>/<collection> --username <username>

This applies to both read and write permissions for the specified user.

Tag

To set metadata for a given file object or collection in your data store, use the tag command:

plantit tag <data object ID> -t <token> -a k1=v1 -a k2=v2

This applies the two given attributes to the data object (attributes must be formatted key=value).

Warning: this command is an overwrite, not an append. We do not support appending tags as there is no Terrain endpoint to add/remove individual metadata attributes. Note also that by default, key/value pairs are passed on the avus attribute of the request body rather than irods-avus, e.g.:

POST https://de.cyverse.org/terrain/secured/filesystem/<ID>/metadata
{
    "irods-avus": [],
    "avus": [
        {
            "attr": "some key"
            "value": "some value",
            "unit": ""
        }
    ]
}

To configure irods-avus attributes as well as or in place of standard attributes, use the --irods_attribute (-ia) option. Both standard and iRODS attributes can be used in the same invocation.

Tags

To retrieve the metadata describing a particular file object or collection, use the tags command:

plantit tags <data object ID> -t <token>

This will retrieve standard attributes by default. To retrieve iRODS attributes instead, use the --irods (-i) option.

Scripts

To generate SLURM job scripts for a container workflow, use the scripts command. For instance:

plantit scripts ...

Submit

To submit a container workflow as a job script on a cluster, use the submit command. For instance, to copy the contents of the current working directory to a cluster and submit the job defined in job.sh:

plantit submit -p . -j job.sh \
  --cluster_host <hostname or IP> \
  --cluster_user <user account name> \
  --cluster_key <private SSH key> \
  --cluster_target <location to copy job scripts and input files>

Development

First, clone the repo with git clone https://github.com/Computational-Plant-Science/plantit-cli.git.

Create a Python3 virtual environment, e.g. python3 -m venv venv, then install plantit and core dependencies with pip install .. Install testing and linting dependencies as well with `pip install ".[test]".

Tests

The tests can be run from the project root with pytest (or python3 -m pytest). Use -v for verbose mode and -n auto to run them in parallel on as many cores as your machine will spare.

Note: some tests required the CYVERSE_USERNAME and CYVERSE_PASSWORD environment variables. You can set these manually or put them in a .env file in the project root — pytest-dotenv will detect them in the latter case. Test cases will use this CyVerse account and its associated data store as a test environment. Each test case isolates its workspace to a folder named by GUID.

Markers

The full test suite should take 5-10 minutes to run, depending on the delay configured to allow the CyVerse Data Store to become consistent. This is 10 seconds per write operation, by default.

Note: The CyVerse data store is not immediately consistent and write operations may take some time to be reflected in subsequent reads. Tests must wait some unknown amount of time to allow the Data Store to update its internal state. If tests begin to fail intermittently, the DEFAULT_SLEEP variable in plantit/terrain/tests/conftest.py may need to be increased.

A fast subset of the tests can be run with pytest -S (short for --smoke). The smoke tests should complete in under a minute.