pytask-parallel

Parallelize the execution of tasks with pytask.


Keywords
pytask
License
MIT
Install
pip install pytask-parallel==0.4.1

Documentation

pytask-parallel

PyPI PyPI - Python Version image image PyPI - License image image pre-commit.ci status image


Parallelize the execution of tasks with pytask-parallel which is a plugin for pytask.

Installation

pytask-parallel is available on PyPI and Anaconda.org. Install it with

$ pip install pytask-parallel

# or

$ conda install -c conda-forge pytask-parallel

By default, the plugin uses concurrent.futures.ProcessPoolExecutor.

It is also possible to select the executor from loky or ThreadPoolExecutor from the concurrent.futures module as backends to execute tasks asynchronously.

Usage

To parallelize your tasks across many workers, pass an integer greater than 1 or 'auto' to the command-line interface.

$ pytask -n 2
$ pytask --n-workers 2

# Starts os.cpu_count() - 1 workers.
$ pytask -n auto

Using processes to parallelize the execution of tasks is useful for CPU bound tasks such as numerical computations. (Here is an explanation on what CPU or IO bound means.)

For IO bound tasks, tasks where the limiting factor are network responses, access to files, you can parallelize via threads.

$ pytask --parallel-backend threads

You can also set the options in a pyproject.toml.

# This is the default configuration. Note that, parallelization is turned off.

[tool.pytask.ini_options]
n_workers = 1
parallel_backend = "processes"  # or loky or threads

Some implementation details

Parallelization and Debugging

It is not possible to combine parallelization with debugging. That is why --pdb or --trace deactivate parallelization.

If you parallelize the execution of your tasks using two or more workers, do not use breakpoint() or import pdb; pdb.set_trace() since both will cause exceptions.

Threads and warnings

Capturing warnings is not thread-safe. Therefore, warnings cannot be captured reliably when tasks are parallelized with --parallel-backend threads.

Changes

Consult the release notes to find out about what is new.

Development

  • pytask-parallel does not call the pytask_execute_task_protocol hook specification/entry-point because pytask_execute_task_setup and pytask_execute_task need to be separated from pytask_execute_task_teardown. Thus, plugins which change this hook specification may not interact well with the parallelization.

  • There are two PRs for CPython which try to re-enable setting custom reducers which should have been working, but does not. Here are the references.