biobb-remote

Biobb_remote is the Biobb module for remote execution via ssl.


Keywords
Bioinformatics, Workflows, BioExcel, Compatibility
License
Apache-2.0
Install
pip install biobb-remote==1.2.2

Documentation

biobb_remote

Introduction

Biobb_remote is a package to allow biobb's to be executed on remote sites through ssh

ssh_credentials.py

Provides SSHCredentials to manage the generation, and installation of ssh credentials

credentials = SSHCredentials(host='', userid='', generate_key=False, look_for_keys=True)
  • host (str): remote host name
  • userid (str): remote user name
  • generate_key (bool): Generates new public/private keys pair
  • look_for_keys (bool): Look user's .ssh keys if keys not set

Methods

(void) credentials.save(output_path, public_key_path=None, private_key_path=None, passwd=None)

Stores SSHCredentials object in a external file

  • output_path (str): Path to file
  • public_key_path (str): Path to a standard public key file
  • private_key_path (str): Path to a standard private key file
  • passwd (str): Password to encrypt private key (optional)
(void) credentials.load_from_file(credentials_path, passwd=None)

Recovers SSHCredentials object from disk file

  • credentials_path (str): Path to packed credentials file.
  • passwd (str): Passwd to decrypt private key (optional)
(void) credentials.load_from_private_key_file(private_path, passwd=None)

Recovers SSHCredentials object from disk file

  • private_path (str): Path to private key file.
  • passwd (str): Passwd to decrypt private key (optional)
(void) credentials.generate_key(nbits=2048)

Generates RSA keys pair

  • nbits (int): number of bits the generated key
(str) credentials.get_public_key(suffix='@biobb')

Returns a readable publik key suitable to addto authorized keys

  • suffix (str): Added to the key for identify it.
(str) credentials.get_private_key(passwd=None)

Returns a readable possibly encrypted private key

  • passwd (str): Password to encrypt private key (optional)
(bool) credentials.check_host_auth()

Checks for public_key in remote .ssh/authorized_keys file. Requires users' SSH access to host

(void) credentials.install_host_auth(file_bck='bck')

Installs public_key on remote .ssh/authorized_keys file. Requires users' SSH access to host

  • file_bck (str): Generates an authorized.file_bck file with original authorized keys
(void) credentials.remove_host_auth(file_bck='biobb')

Removes public_key on remote .ssh/authorized_keys file. Requires users' SSH access to host

  • file_bck (str): Generates an authorized.file_bck file with original authorized keys

ssh_session.py

Class wrapping ssh operations

ssh_session = SSHSession(ssh_data=None, credentials_path=None, private_path=None, passwd=None)
  • ssh_data (SSHCredentials) : SSHCredentials object
  • credentials_path (str) : Path to packed credentials file to use
  • private_path (str): Path to private key file
  • passwd (str): Password to decrypt credentials (optional)
(str) ssh_session.run_command(command)

Runs command on remote. Returns stdout + stderr

  • command (str): Command line to execute
(bool | file_handle) ssh_session.run_sftp(oper, input_file_path, output_file_path='')

Runs SFTP session on remote

  • oper (str): Operation to perform, one of * get (gets a single file from input_file_path (remote) to output_file_path (local) ) * put (puts a single file from input_file_path (local) to output_file_path (remote) * create (creates a file in output_file_path (remote) from input_file_path string- * file (opens a remote file in input_file_path for read). Returns a file handle. * listdir (returns a list of files in remote input_file_path
  • input_file_path (str): Input file path or input string
  • output_file_path (str): Output file path

task.py

DataBundle Class to manage bundles of input/output files

data_bundle = DataBundle(bundle_id)
  • bundle_id (str): Id for the data bundle
data_bundle.add_file(file_path)

Adds a single file to the data bundle

  • file_path (str): Path to the file to add
data_bundle.add_dir(dir_path)

Adds all files from a directory

  • dir_path (str): Path to the directory to add
([str]) data_bundle.get_file_names()

Generates a list of names or included files

(str) data_bundle.to_json()

Generates a Json dump

Task Abstract module to handle remote tasks. Not for direct use, extend to include specific queueing systems

Constants

task.UNKNOWN = 0
task.SUBMITTED = 1
task.RUNNING = 2
task.CANCELLED = 3
task.FINISHED = 4
task.CLOSING = 5
task.JOB_STATUS (dict)

Methods

task=Task(host=None, userid=None, look_for_keys=True, debug_ssh=False)

Classe to handle task execution

  • host (str): Remote host
  • userid (str): Remote user id
  • look_for_keys (bool): Look for keys available in user's .ssh directory debug_ssh (bool): Open SSH session with debug activated
(void) task.load_data_from_file(file_path, mode='json')

Loads accumulated task data from external file

  • file_path (str): Path to file
  • mode (str): Format. Json | Pickle
(void) task.save(save_file_path, mode='json', verbose=Falsse)

Saves current task status in a external file. Can be used to recover session at a later time.

  • save_file_path (str): Path to file
  • mode (str): Format to use json|pickle.
  • verbose (bool): Print additional information
(void) task.set_credentials(credentials, passwd=None):

Loads ssh credentials from SSHCredentials object or from a external file

  • credentials (SSHCredentials | str): SSHCredentials object or a path to a file containing the data
  • passwd (str): Password to decrypt private key when loaded from file (optional)
(void) task.set_private_key(private_path, passwd=None):

Inserts private key from external file

  • private_path (str): Path to private key file
  • passwd (str, optional): Password to decrypt private key
(void) task.load_host_config(host_config_path)

Loads a pre-defined host configuration file (json format)

  • host_config_path (str): Path to the configuration file
(void) task.set_custom_settings(self, ref_setting='default', patch=None, clean=False)

Generates a custom queue setting based on existing one

  • ref_setting (str): Base settings to modify
  • patch (dict): Patch to apply
  • clean (bool): Clean existing settings
(void) task.prep_auto_settings(total_cores=0, nodes=0, cpus_per_task=1,  num_gpus=0)

Generates queue configuration settings for balancing MPI/OMP/GPU.

  • total_cores (int): Aproximated number of cores to use
  • nodes (int): Number of complete nodes to use (overrides total_cores)
  • cpus_per_task (int): OMP processes per MPI task to allocate
  • num_gpus (int): Num of GPUs per node to allocate
(void) task.set_local_data_bundle(local_data_path, add_files=True)

Builds local data bundle from a local directory

  • local_data_path (str): Path to local data directory
  • add_files (bool): On create, add all files in the directory.
(void) task.send_input_data(remote_base_path, overwrite=True, new_only=True)

Uploads data bundle files to remote working dir

  • remote_base_path (str): Remote base path for all task activites. Each task will create a unique working dir (re-usable).
  • overwrite (bool): Upload files even if they already exists in the remote working dir.
  • new_only (bool): Overwrite only with newer files
(str) task.get_remote_py_script(python_import, files, command, properties='')

Generates 1 line python code to be executed in the queue script using python -c

  • python_import (str): Python import line(s) to include (; separated)
  • files (dict): File names to associate to biobb required path parameters
  • command (str): biobb class to launch
  • properties (dict | str): Either a dict, path to a json or yaml config file or a 1-line Json with the required biobb parameters
(str) task.get_remote_comm_line(command, files, use_biobb=False, properties='', cmd_settings=''):

Generates a command line for queue script. Can be used to launch a biobb module or any command line remotely.

  • job_name (str): Job name to display (optional, used to identify queue jobs, and stdout/stderr logs)
  • command (str): Command to execute
  • files (dict): Input/output files. "--" prefix added if only a parameter name is provided
  • use_biobb (bool): Set to prepend biobb path on host
  • properties (dict): BioBB properties
  • cmd_settings (dict): Additional settings to add to the command line, pre-set bundles can be configured in host config data.
(void) task.submit(job_name=None, queue_settings='default', modules=None, local_run_script='', conda_env='', save_file_path=None, poll_time=0)

Submits task to remote. Optionally waits until completion.

  • job_name (str): Job name to display in the queuing system. Stdout/stderr logs are named as job.name.(out|err). Optional, defaults to queue default behaviour.
  • queue_settings (str): Label for set of queue settings (defined in host configuration). Use 'custom' for user defined settings (see set_custom_settings)
  • modules (str): modules to activate (defined in host configuration)
  • conda_env (str): Conda environment to activate
  • local_run_script (str): Path to local script to run or a string with the script itself (identified by leading '#' tag)
  • save_file_path (str): Path to local task log file to update after submit (Default None),
  • poll_time (int): if set, polls periodically for job completion (seconds)
(void) task.cancel(remove_data=False)

Cancels running task

  • remove_data (bool): Removes remoted workign directory.
(str) task.check_queue()

Check queue status. Returns output of the remote appropriate command

(void) check_job(update=True, poll_time=0, save_file_path=None)

Prints job status to stdout

  • update (bool): update status before printing it
  • poll_time (int): poll until job finished. Poll interval in seconds.
  • save_file_path (str): Path to local task log file to update status (Default None),
(void) task.get_remote_file(file):

Gets file from remote working dir

  • file (str): File name
([stdout, stderr]) task.get_logs()

Get queue logs

(void) task.get_output_data(local_data_path='', files_only=None, overwrite=True, new_only=True)

Downloads remote working dir contents to local path

  • local_data_path (str): Path to local directory
  • files_only ([str]): Only download files in list, if empty download all files
  • overwrite (bool): Overwrite local files if they exist
  • new_only (bool): Overwrite only with newer files
(void) task.clean_remote()

Remove remote working dir

slurm.py

Task Class extended to include specific settings for Slurm queueing system

conf/XXX.json

Host configuration files

Utilities

credentials

Generates kay pairs to be consumed by other utilities

credentials [-h] [--user USERID] [--host HOSTNAME]
            [--pubkey_path PUBKEY_PATH] [--nbits NBITS] --keys_path
            KEYS_PATH [--privkey_path PRIVKEY_PATH]
            {create,get_pubkey,get_private}

scp_service

Simple sftp service

scp_service [-h] --keys_path KEYS_PATH [-i INPUT_FILE_PATH]
                   [-o OUTPUT_FILE_PATH]
                   {get,put,create,file,listdir}

ssh_command

Simple remote ssh command

ssh_command [-h] --keys_path KEYS_PATH [command [command ...]]

slurm_test

Complete set of functions to manage slurm submissions remotely

slurm_test [-h] --keys_path KEYS_PATH [--script SCRIPT_PATH]
                  [--local_data LOCAL_DATA_PATH] [--remote REMOTE_PATH]
                  [--queue_settings Q_SETTINGS] [--module MODULE]
                  [--task_data TASK_FILE_PATH]
                  {submit,queue,cancel,status,get_data,put_data}

Version

v1.2.2 November 2021

Copyright & Licensing

This software has been developed in the MMB group (http://mmb.irbbarcelona.org) at the BSC (http://www.bsc.es/) & IRB (https://www.irbbarcelona.org/) for the European BioExcel (http://bioexcel.eu/), funded by the European Commission (EU H2020 675728).

Licensed under the GNU Lesser General Public License v2.1, see the file LICENSE for details.