pythonic excellerator interface

pip install errand==0.2.10


Errand: Pythonic GPU and Accelerator Interface

errand is a Python module that enables an easy, scalable, and future-proof programming interface for accelerator hardwares such as GPUs.

errand makes use of conventional programming tools that you may be already familar with. For example, errand uses Nvidia CUDA compiler or AMD HIP compiler if needed. errand takes responsibilities of data movements between GPU and CPU so that you can focus on computation in CUDA or HIP.

Further documentation is available at


The easiest way to install errand is to use the pip python package manager.

>>> pip install errand

You can install errand from github code repository if you want to try the latest version.

>>> git clone
>>> cd errand
>>> python install

NumPy array example in CUDA(Nvidia) or HIP(AMD)

To run the example, create two source files in a folder as shown below, and run the Python script as usual. The example assumes that at least one of CUDA compiler (nvcc) and HIP compiler (hipcc) is usuable and GPU is available on your system.

>>> python

Python code (

# This example shows how to add numpy arrays
# using Errand with Cuda or Hip backend.

import numpy as np
from errand import Errand

NROW = 10
NCOL = 20

a = np.ones((NROW, NCOL))
b = np.ones((NROW, NCOL))
c = np.zeros((NROW, NCOL))

# creates an errand context with an "order"
with Errand("order.ord") as erd:

        # call NROW teams of NCOL gofers
        gofers = erd.gofers(NCOL, NROW)

        # build workshop with input(a, b) and output(c)
        workshop = erd.workshop(a, b, "->", c)

        # let gofers do their work at the workshop

        # do your work below while gofers are doing their work

# check the result when the errand is completed
if np.array_equal(c, a+b):


Order code (order.ord)

[cuda, hip]

        // NROW teams are interpreted to Cuda/Hip blocks
        // NCOL gofers of a team are interpreted to Cuda/Hip threads

        int row = blockIdx.x;
        int col = threadIdx.x;

        // the input and output variables keep the convinience of numpy

        if (row < x.shape(0) && col < x.shape(1))
                c(row, col) = a(row, col) + b(row, col);


        #pragma acc loop gang
        for (int row = 0; row < a.shape(0); row++) {

                #pragma acc loop worker
                for (int col = 0; col < a.shape(1); col++) {
                        c(row, col) = a(row, col) + b(row, col);


        int row = a.unravel_index(ERRAND_GOFER_ID, 0);
        int col = a.unravel_index(ERRAND_GOFER_ID, 1);

        if (row < a.shape(0) && col < a.shape(1) )
                c(row, col) = a(row, col) + b(row, col);