ExKaldi: A Python-based Extension Tool of Kaldi
ExKaldi automatic speech recognition toolkit is developed to build an interface between Kaldi ASR toolkit and Python. Differing from other Kaldi wrappers, ExKaldi have these features:
- Integrated APIs to build a ASR systems, including feature extraction, GMM-HMM acoustic model training, N-Grams language model training, decoding and scoring.
- ExKaldi provides tools to support train DNN acoustic model with Deep Learning frameworks, such as Tensorflow.
- ExKaldi supports CTC decoding.
The goal of ExKaldi is to help developers build high-performance ASR systems with Python language easily.
Current version: 1.3.5. (We only tested our toolkit on Ubuntu >= 16., python3.6,python3.7,python3.8 with gh-action)
- If you have not installed Kaldi ASR toolkit, clone the Kaldi ASR toolkit repository firstly (Kaldi version 5.5 is expected.)
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
And follow these three tutorial files to install and compile it.
less kaldi/INSTALL less kaldi/tools/INSTALL less kaldi/src/INSTALL
- Clone the ExKaldi source code from our github project, then install it.
Install with pip
$ pip install https://github.com/kpu/kenlm/archive/master.zip $ pip install exkaldi
Install with Source
$ git clone https://github.com/wangyu09/exkaldi.git $ cd exkaldi $ bash quick_install.sh
- Check if it is installed correctly.
python3 -c "import exkaldi"
In exkaldi/tutorials directory, we prepared a simple tutorial to show how to use ExKaldi APIs to build a ASR system from the scratch. The data is from librispeech train_100_clean dataset. This tutorial includes:
- Extract and process MFCC feature.
- Train and querying a N-grams language model.
- Train monophone GMM-HMM, build decision tree, and train triphone GMM-HMM.
- Train a DNN acoustic model with Tensorflow.
- Compile WFST decoding graph.
- Decode based on GMM-HMM and DNN-HMM.
- Process lattice and compute WER score.
We have done some experiments to test ExKaldi toolkit, and they achieved a good performance.
1, The perplexity of various language models. All these systems are trained with TIMIT train dataset and tested with TIMIT test data. The score showed in the table is PPL score.
|Kaldi baseline irstlm||14.41||---||---||---||---|
2, The phone error rate (PER) of various GMM-HMM-based systems. All these systems are trained with TIMIT train dataset and tested with TIMIT test dataset. The Language model backend used in ExKaldi is KenLM. From the results, we can know than KenLm is avaliable to optimize the language model. And what's more, with ExKaldi, we cherry-picked the N-grams model by testing the perplexity and it improved the performance of ASR system.
|Kaldi baseline 2grams||32.54||26.17||23.63||21.54|
3, The phone error rate (PER) of two DNN-HMM-based systems. We trained our models with Tensorflow 2.3. The version of PyTorch-Kaldi toolkit is 1.0 in our experiment.