Tools for real time speech processing, keyword spotting


Keywords
kws, hotword, keyword, vad, utterance, voice-command, speech
License
Other
Install
pip install pyrtstools==0.2.9

Documentation

PyRTSTools

version pypi version

Introduction

Python Real Time Speech Tools is a collection of classes designed to develop a real-time speech processing pipeline for voice user interface.

Disclaimer: This is an early version designed to provide a voice command detection pipeline for LinTO. However the elements are designed to be generic and can be used for other purposes.

Features

pyrtstools features different blocks:

  • Audio acquisition
  • Voice activity detection
  • Feature extraction
  • Keyword spotting

All the element are designed to be easy to use and easy to interconnect.

Installation

In order to install the package you need python3 and pip/setuptools installed.

Recquired libraries are:

  • portaudio19-dev (For pyaudio microphone input)

The python dependecies are automaticly installed. (Note that it may takes some time as some of them -numpy, tensorflow- are faily large)

pypi

sudo pip3 install pyrtstools

From source

git clone https://github.com/linto-ai/pyrtstools.git
cd pyrtstools
sudo ./setup.py install

Note for installation on ARM

pyrtstools requires tensorflow>=2.0.0, however wheels for arm stops at 1.14 on pywheels & pypi. You must install tensorflow-2.0.0 using the compiled wheel prior to installing pyrtstools. .whl file can be found here

wget https://github.com/lhelontra/tensorflow-on-arm/releases/download/v2.0.0/tensorflow-2.0.0-cp37-none-linux_armv7l.whl
pip install tensorflow-2.0.0-cp37-none-linux_armv7l.whl

Usage

Here are a simple pipeline designed to detect hotword from microphone.

import pyrtstools as rts

def on_detect(i, v):
    print("Detected keyword {} with confidence {}".format(i, v))

audioParam = rts.listenner.AudioParams() # Hold signal parameters
listenner = rts.listenner.Listenner(audioParam) # Microphone input
btn = rts.transform.ByteToNum(normalize=True) #Convert raw signal to numerical
featParams = rts.features.MFCCParams() # Hold MFCC features parameters
mfcc = rts.features.SonopyMFCC(featParams) # Extract MFCC
kws = rts.kws.KWS("/path/to/your-model") # Hotword spotting
kws.on_detection = on_detect # On keyword detection. 
pipeline = rts.Pipeline([listenner, btn, mfcc, kws]) # Holds elements and links them
pipeline.start() # Start all the elements
try:
    listenner.join() # Wait for the microphone to finish (To block the execution)
except KeyboardInterrupt:
    pipeline.close()

Every block is located in a subpackage:

  • Audio acquisition: pyrtstools.listenner
  • Voice activity detection: pyrtstools.vad
  • Features extraction: pyrtstools.features
  • Keyword spotting: pyrtstools.kws
  • Signal transformation: pyrtstools.transform

Every element and class is documented.

Licence

This project is under aGPLv3 licence, feel free to use and modify the code under those terms. See LICENCE

Used libraries