py-tesseract

Remote code execution with the GA4GH Task Execution API


License
MIT
Install
pip install py-tesseract==0.3.0

Documentation

Build Status Coverage Status License: MIT

tesseract

tesseract is a library that enables the remote execution of python code on systems implementing the GA4GH Task Execution API.

Quick Start

from __future__ import print_function

from tesseract import Tesseract, FileStore


def identity(n):
    return n


def say_hello(a, b):
    return "hello " + identity(a) + b


fs = FileStore("./test_store/")
r = Tesseract(fs, "http://localhost:8000")
r.with_resources(
    cpu_cores=1, ram_gb=4, disk_gb=None, 
    docker="python:2.7", libraries=["cloudpickle"]
)

future = r.run(say_hello, "world", b="!")
result = future.result()
print(result)

r2 = r.clone().with_resources(cpu_cores=4)
f2 = r2.run(say_hello, "more", b="cpus!")
r2 = f2.result()
print(r2)

Object store support

If you provide a swift, s3, or gs bucket url to your FileStore tesseract_ will attempt to automatically detect your credentials for each of these systems.

To setup your environment for this run the following commands:

  • Google Storage - gcloud auth application-default login
  • Amazon S3 - aws configure
  • Swift - source openrc.sh

Input files

If your function expects input files to be available at a given path then add:

r.with_input("s3://your-bucket/path/to/yourfile.txt", "/home/ubuntu/yourfile.txt")

The first argument specifies where the file is available, the second specifies where your function will expect to find the file.

Output files

If your function generates files during its run you can specify these files as shown below and tesseract will handle getting them uploaded to the designated bucket.

r.with_output("./relative/path/to/outputfile.txt")

Resources