R-3PO -- Richard's Parallel Processing Pipeline
A library built on top of Ray to make embarassingly parallel problems embarassingly easy.
Suppose you have lots of data files that need to be processed in the exact same way with the same function. And suppose you want to save the results of that processing into a CSV file. This is an embarassingly parallel problem: it should be embarassingly easy.
And that's what R3PO aims to deliver: R3PO lets you do it with a
file and three lines of code.
job_name: count_produce output_path: /home/lieu/dev/r3po/sample/output_dir processes: 2 source_file_part: .json source_path: /home/lieu/dev/r3po/sample/produce_log working_dir: /home/lieu/dev/r3po/sample/working_dir
from r3po import jobbuilder, jobrunner # Import the function that will be called by your processes from count_fruits import count_fruits CONFIG_YAML_FP = './config.yaml' # Build jobs jobbuilder.build_jobs(CONFIG_YAML_FP) # Run jobs jobrunner.run_jobs(CONFIG_YAML_FP, count_fruits)
This will run the function
count_fruits on all the
source_path, and save the results as CSVs in
(one row per JSON file).
That's it! R3PO automatically handles the distribution of tasks to processes, saves your progress so you can stop and restart the job anytime, and logs all errors automatically.
Quickstart (worked example)
[TODO] -- but check the sample directory
pip3 install r3po