pytorch-speech-features

PyTorch Speech Feature extraction


Keywords
asr, audio-processing, speech, speech-processing
License
MIT
Install
pip install pytorch-speech-features==0.0.3

Documentation

pytorch_speech_features

DOI

A simple PyTorch reimplementation of library python_speech_features.

Uses

  • Great for Intepretability experiments - All audio processing operations can be performed and the results can be backpropagated to the original signal tensor.
  • Supports Hybrid Model Design - Parametric operations at different stages of audio processing.

Example use

Installation

Install from PyPI

pip install pytorch-speech-features

Install from GitHub

git clone https://github.com/Debjoy10/pytorch_speech_features
python setup.py develop

Usage

Functions same as python_speech_features (Refer to its documentation here).

Instead of input signal as list / numpy array, pass tensor (both 'cpu' and 'cuda' supported!!).

See example use given above.

Supported features:

  • Mel Frequency Cepstral Coefficients
  • Filterbank Energies
  • Log Filterbank Energies
  • Spectral Subband Centroids

Testing

Two things to test for pytorch_speech_features operations -

  1. Similarity to python_speech_features outputs.
  2. Gradient correctness via Autograd Gradcheck.
Find the testing python notebook here -

Open In Colab

Citation

@misc{https://doi.org/10.5281/zenodo.8021586,
  doi = {10.5281/ZENODO.8021586},
  url = {https://zenodo.org/record/8021586},
  author = {{Debjoy Saha}},
  title = {Debjoy10/pytorch_speech_features: Release v0.0.1},
  publisher = {Zenodo},
  year = {2023},
  copyright = {Open Access}
}

References

  • Python_speech_features library - Link
  • Sample english.wav - Link