Overlap-save method for fast convolution, like Scipy's fftconvolve, but memory-efficient

overlap, save, add, fast, convolution, convolve, fft, fftconvolve, fir, filter
pip install overlap-save==1.1.2


Fast-convolution via overlap-save: a partial drop-in replacement for scipy.signal.fftconvolve


  • 1D and 2D (both tested) and higher (untested) arrays
    • (Currently unsupported is convolving different-dimensional signals)
  • Real and complex arrays
  • Memory-mapped input/outputs fully supported (and tested)
  • Supports alternative FFT engines such as PyFFTW
  • Supports reflect-mode (signal assumed to reflect infinitely, instead of 0 outside its support; useful for avoiding edge effects)
  • Relatively straightforward to paralellize each step of the algorithm
  • Extensively unit-tested

When it can be used as a drop-in replacement for fftconvolve:

  • when you call fftconvolve with mode='same' and axes=None
  • See docs

Installation $ pip install overlap_save then import ols.

Usage Example (culled from unit tests):

import numpy as np
from scipy.signal import fftconvolve

# Generate a 100x100 signal array and a 10x10 filter array
nx = 100
nh = 10
x = np.random.randint(-10, 10, size=(nx, nx)) + 1.0
h = np.random.randint(-10, 10, size=(nh, nh)) + 1.0

# Compute the linear convolution using the FFT, keeping the center 100x100 samples
expected = fftconvolve(x, h, mode='same')

# Use overlap-save, computing the output in 6x5-sized chunks. Instead of one huge FFT, we do a
# several tiny ones
from ols import ols
actual = ols(x, h, [6, 5])

# The two will match
assert np.allclose(expected, actual)

API def ols(x, h, size=None, nfft=None, out=None, rfftn=None, irfftn=None, mode='constant', **kwargs) Perform multidimensional overlap-save fast-convolution.

As mentioned in the module docstring, the output of this function will be within machine precision of scipy.signal.fftconvolve(x, h, mode='same').

However, rather than computing three potentially-large FFTs (one for x, one for h, and an inverse FFT for their product), the overlap-save algorithm performs a sequence of smaller FFTs. This makes it appropriate for situations where you may not be able to store the signals' FFTs in RAM, or even cases where you cannot even store the signals themselves in RAM, i.e., when you have to memory-map them from disk.

x and h can be multidimensional (1D and 2D are extensively tested), but must have the same rank, i.e., len(x.shape) == len(h.shape).

size is a list of integers that specifies the sizes of the output that will be computed in each iteration of the overlap-save algorithm. It must be the same length as x.shape and h.shape. If not provided, defaults to [4 * x for x in h.shape], i.e., will break up the output into chunks whose size is governed by the size of h.

nfft is a list of integers that specifies the size of the FFT to be used. It's length must be equal to the length of size. Each element of this list must be large enough to store the linear convolution, i.e., all([nfft[i] >= size[i] + h.shape[i] - 1 for i in range(len(nfft))]) must be True. Set this to a multiple of small prime factors, which is the default.

If provided, the results will be stored in out. This is useful for memory-mapped outputs, e.g.

If not provided, rfftn and irfftn default to those in numpy.fft. Other implementations matching Numpy's, such as PyFFTW, will also work.

By default, mode='constant' assumes elements of x outside its boundaries are 0, which matches the textbook definition of convolution. mode='reflect' is also supported. It should be straightforward to add support for other modes supported by np.pad. Keyword arguments in **kwargs are passed to np.pad.

Development To simply develop this repo, instead of installing it, run the following (uses venv with Python 3—adapt for virtualenv if you are on Python 2):

$ git clone https://github.com/fasiha/overlap_save-py.git
$ cd overlap_save-py                  # clone the repo and change to its directory
$ git checkout -b MY-DEV-BRANCH       # check out a new git branch
$ python -m venv .                    # set up a venv (one-time)
$ source bin/activate                 # activate venv
$ pip install -r requirements.txt     # install module requirements
$ pip install -r requirements-dev.txt # install requirements for tests
$ nosetests -w .                      # run all tests. You may have to re-activate venv


1.1.2 support for numpy<1.15 (where numpy.flip had more limited behavior)—thanks again Matteo!

1.1.1 full complex support (thanks Matteo Bachetti! !1 & !2)

1.1.0 PyFFTW and mirror mode added