tesseract
tesseract is a library that enables the remote execution of python code on systems implementing the GA4GH Task Execution API.
Quick Start
from __future__ import print_function
from tesseract import Tesseract, FileStore
def identity(n):
return n
def say_hello(a, b):
return "hello " + identity(a) + b
fs = FileStore("./test_store/")
r = Tesseract(fs, "http://localhost:8000")
r.with_resources(
cpu_cores=1, ram_gb=4, disk_gb=None,
docker="python:2.7", libraries=["cloudpickle"]
)
future = r.run(say_hello, "world", b="!")
result = future.result()
print(result)
r2 = r.clone().with_resources(cpu_cores=4)
f2 = r2.run(say_hello, "more", b="cpus!")
r2 = f2.result()
print(r2)
Object store support
If you provide a swift, s3, or gs bucket url to your FileStore
tesseract_
will attempt to automatically detect your credentials for each of these systems.
To setup your environment for this run the following commands:
- Google Storage -
gcloud auth application-default login
- Amazon S3 -
aws configure
- Swift -
source openrc.sh
Input files
If your function expects input files to be available at a given path then add:
r.with_input("s3://your-bucket/path/to/yourfile.txt", "/home/ubuntu/yourfile.txt")
The first argument specifies where the file is available, the second specifies where your function will expect to find the file.
Output files
If your function generates files during its run you can specify these files as shown below and tesseract will handle getting them uploaded to the designated bucket.
r.with_output("./relative/path/to/outputfile.txt")