kaldi-helpers

Scripts for preparing language data for use with Kaldi ASR


Keywords
automatic-speech-recognition, computational-linguistics, docker, kaldi, kaldi-helpers, python, script, speech, speech-to-text, transcription
Install
pip install kaldi-helpers==0.22

Documentation

CoEDL Kaldi Helpers

A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

Read about setting up Docker to run all this.

For more information about data requirements, see the data guide.

Requirements

This pipeline relies on Python 3.6 and several open-source Python packages (listed here). It also assumes you have Kaldi, sox and task installed. We highly recommend using our docker image.

Tasks

This library uses the task tool to run the more complex processes automatically. Once you've set up Kaldi Helpers, you can run the various pipeline tasks we've developed (or out of the box in the docker image). You can read about these tasks here.

Workflow