Phonetisaurus for Python

Python wrapper for the excellent phonetisaurus grapheme to phoneme tool (license).

Includes pre-built binaries for:

Requirements

For x86_64 systems:

$ pip install phonetisaurus

For Raspberry Pi, see Releases for compatible wheels:

Raspberry Pi 0/1
- phonetisaurus-<VERSION>-py3-none-linux_armv6l.whl
Raspberry Pi 2/3/4 (32-bit)
- phonetisaurus-<VERSION>-py3-none-linux_armv7l.whl
Raspberry Pi 3/4 (64-bit)
- phonetisaurus-<VERSION>-py3-none-linux_aarch64.whl

Assuming you have a lexicon formatted like the CMU pronouncing dictionary:

word1 phoneme1 phoneme2 ...
word2 phoneme1 phoneme2 phoneme3 ...

saved to lexicon.dict run:

$ phonetisaurus train --model /path/to/write/g2p.fst /path/to/lexicon.dict

You may supply more than one lexicon.

See phonetisaurus train --help for more options.

$ phonetisaurus predict --model /path/to/g2p.fst word1 word2 ...

If no words are provided on the command line, they will be read line-by-line from standard in.

You may optionally supply one or more --lexicon /path/to/lexicon.dict arguments to avoid guessing pronunciations for known words.

See phonetisaurus predict --help for more options.