dict-from-dict

Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.


Keywords
Language, Linguistics
License
MIT
Install
pip install dict-from-dict==0.0.4

Documentation

dict-from-dict

PyPI PyPI MIT PyPI PyPI PyPI DOI

Command-line interface (CLI) to create a pronunciation dictionary from an other pronunciation dictionary with the possibility of ignoring punctuation and splitting on hyphens before lookup.

Features

  • ignore casing of words while lookup
  • trimming symbols at start and end of word before lookup
  • separate word on hyphen before lookup
    • if the dictionary contains words with hyphens they will be considered first (see example below)
  • words with multiple pronunciations are supported
    • weights will be multiplied for hyphenated words (see example below)
  • outputting OOV words
  • multiprocessing

Installation

pip install dict-from-dict --user

Usage

dict-from-dict-cli

Example

# Create example vocabulary
cat > /tmp/vocabulary.txt << EOF
Test?
abc,
"def
Test-def.
"xyz?
"uv-w?
EOF

# Create example dictionary
cat > /tmp/dictionary.dict << EOF
test  0.7  T E0 S T
test  0.3  T E1 S T
def  0.4  D E0 F
def  0.6  D E1 F
xyz  2.0  ?
"xyz?  1.0  ' X Y Z ??
uv  2.0  ?
w  2.0  ?
uv-w  1.0  U V - W
EOF

# Create dictionary from vocabulary and example dictionary
dict-from-dict-cli \
  /tmp/vocabulary.txt \
  /tmp/dictionary.dict --consider-weights \
  /tmp/result.dict \
  --ignore-case --split-on-hyphen \
  --trim "?" "\"" "," "." \
  --n-jobs 4 \
  --oov-out /tmp/oov.txt

cat /tmp/result.dict
# -------
# Output:
# -------
Test?  0.7  T E0 S T ?
Test?  0.3  T E1 S T ?
"def  0.4  " D E0 F
"def  0.6  " D E1 F
Test-def.  0.27999999999999997  T E0 S T - D E0 F .
Test-def.  0.42  T E0 S T - D E1 F .
Test-def.  0.12  T E1 S T - D E0 F .
Test-def.  0.18  T E1 S T - D E1 F .
"xyz?  1.0  ' X Y Z ??
"uv-w?  1.0  " U V - W ?
# -------

cat /tmp/oov.txt
# -------
# Output:
# -------
# abc,
# -------

Development setup

# update
sudo apt update
# install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \
  python3.8 python3.8-dev python3.8-distutils python3.8-venv \
  python3.9 python3.9-dev python3.9-distutils python3.9-venv \
  python3.10 python3.10-dev python3.10-distutils python3.10-venv \
  python3.11 python3.11-dev python3.11-distutils python3.11-venv \
  python3.12 python3.12-dev python3.12-distutils python3.12-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/pronunciation-dict-creation.git
cd pronunciation-dict-creation
# create virtual environment
python3.8 -m pipenv install --dev

Running the tests

# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd pronunciation-dict-creation
# activate environment
python3.8 -m pipenv shell
# run tests
tox

Final lines of test result output:

  py38: commands succeeded
  py39: commands succeeded
  py310: commands succeeded
  py311: commands succeeded
  py312: commands succeeded
  congratulations :)

License

MIT License

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).

Taubert, S. (2024). dict-from-dict (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.10560441