sonorus

Named after a spell in the Harry Potter Universe, where it amplies the sound of a speaker. In muggles' terminology, this is a repository of modules for audio and speech processing for and on top of machine learning based tasks such as speech-to-text.


Keywords
deep, learning, speech, recognition, to, text, language, modelling
License
MIT
Install
pip install sonorus==0.1.1

Documentation

sonorus

Named after a spell in the Harry Potter Universe, where it amplifies the sound of a speaker. In muggles' terminology, this is a repository of modules for audio and speech processing for and on top of machine learning based tasks such as speech-to-text.

Getting Started:

Installation:

Install dependencies

The repository has dependencies such as kenlm, pyflashlight, fairseq and portaudio which needs to be installed before pip-installable modules

To install kenlm with python bindings, refer to the kenlm github repository.

To install pyflashlight with python bindings, refer to the installation instructions. NOTE that the C++ build itself is not necessarily required for building python bindings. FURTHERMORE, pyflashlight will soon be made pip-installable via pypi.

To install fairseq, refer to requirements and installations from the fairseq github repository. NOTE that the current pip-installable pypi module is of version < 1.0 and hence installation from source is currently required. Once the pypi index is updated with the latest fairseq package, the same can be installed using pip.

pyaudio has a dependency on portaudio. If not using conda, make sure portaudio is installed. For example, for Ubuntu, the same can be installed by executing:

sudo apt install portaudio19-dev or conda install pyaudio in case of working in a conda environment which will install pyaudio along with portaudio.

Finally, install requirements by executing:

pip install -r requirements.txt

or install using conda in a conda environment.

Finally, install the package using:

pip install sonorus

Environment set up:

Note: Environment set up is required while using Google Cloud's speech to text api. For this, Google Application Credentials is to be set as an environment variable by exporting e.g.:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-cloud-credentials.json

Sample running instructions:

  • Receives speech input from microphone and prints it on console using on-device Facebook's Wav2Vec2 model made available by Hugging Face..

python3 examples/streaming-stt.py

To modify the execution parameters of the on-device model such as providing GPU device index in case of availability, the program can be run as:

python3 examples/streaming-stt.py --gpu_idx 0

  • For using Google cloud's speech to text execute:

python3 examples/google-streaming-stt.py