cacheflow

Caching Workflow Engine


Keywords
workflow, execution, cache, caching, notebook
License
BSD-3-Clause
Install
pip install cacheflow==0.3

Documentation

CacheFlow

CacheFlow is a caching workflow engine, capable of executing dataflows while reusing previous results where appropriate, for efficiency. It is very extensible and can be used in many projects.

Goals

  • ☑ Python 3 workflow system
  • ☑ Executes dataflows from JSON files
  • ☐ Can also load from SQL database
  • ☐ Parallel execution
  • ☐ Streaming
  • ☑ Extensible: can add new modules, new storage formats, new caching mechanism, new executors
  • ☐ Pluggable: extensions can be installed from PyPI without forking
  • ☑ Re-usable: can execute workflows by itself, but can also be embedded into applications. Some I plan on developing myself:
    • Literate programming app: snippets or modules embedded into a markdown file, which are executed on render (similar to Rmarkdown). Results would be cached, making later rendering fast
    • Integrate in some of my NYU research projects (VisTrails Vizier, D3M)

Other ideas:

  • ☐ Use Jupyter kernels as backends to execute code (giving me quick access to all the languages they support)
  • ☐ Isolate script execution (to run untrusted Python/... code, for example with Docker)

Non-goals

  • Make a super-scalable and fast workflow execution engine: I'd rather make executors based on Spark, Dask, Ray than re-implement those

Status

Basic structures are here, extracted from D3M. Execution works.