Errand: Pythonic GPU and Accelerator Interface
errand is a Python module that enables an easy, scalable, and future-proof programming interface for accelerator hardwares such as GPUs.
errand makes use of conventional programming tools that you may be already familar with. For example, errand uses Nvidia CUDA compiler or AMD HIP compiler if needed. errand takes responsibilities of data movements between GPU and CPU so that you can focus on computation in CUDA or HIP.
Further documentation is available at errand.readthedocs.io
Installation
The easiest way to install errand is to use the pip python package manager.
>>> pip install errand
You can install errand from github code repository if you want to try the latest version.
>>> git clone https://github.com/grnydawn/errand.git >>> cd errand >>> python setup.py install
NumPy array example in CUDA(Nvidia) or HIP(AMD)
To run the example, create two source files in a folder as shown below, and run the Python script as usual. The example assumes that at least one of CUDA compiler (nvcc) and HIP compiler (hipcc) is usuable and GPU is available on your system.
>>> python main.py
Python code (main.py)
# This example shows how to add numpy arrays # using Errand with Cuda or Hip backend. import numpy as np from errand import Errand NROW = 10 NCOL = 20 a = np.ones((NROW, NCOL)) b = np.ones((NROW, NCOL)) c = np.zeros((NROW, NCOL)) # creates an errand context with an "order" with Errand("order.ord") as erd: # call NROW teams of NCOL gofers gofers = erd.gofers(NCOL, NROW) # build workshop with input(a, b) and output(c) workshop = erd.workshop(a, b, "->", c) # let gofers do their work at the workshop gofers.run(workshop) # do your work below while gofers are doing their work # check the result when the errand is completed if np.array_equal(c, a+b): print("SUCCESS!") else: print("FAILURE!")
Order code (order.ord)
[cuda, hip] // NROW teams are interpreted to Cuda/Hip blocks // NCOL gofers of a team are interpreted to Cuda/Hip threads int row = blockIdx.x; int col = threadIdx.x; // the input and output variables keep the convinience of numpy if (row < x.shape(0) && col < x.shape(1)) c(row, col) = a(row, col) + b(row, col); [openacc-c++] #pragma acc loop gang for (int row = 0; row < a.shape(0); row++) { #pragma acc loop worker for (int col = 0; col < a.shape(1); col++) { c(row, col) = a(row, col) + b(row, col); } } [pthread] int row = a.unravel_index(ERRAND_GOFER_ID, 0); int col = a.unravel_index(ERRAND_GOFER_ID, 1); if (row < a.shape(0) && col < a.shape(1) ) c(row, col) = a(row, col) + b(row, col);