@theresnotime/ipa-validator

IPA validator


Keywords
IPA, validator
License
MIT
Install
npm install @theresnotime/ipa-validator@1.4.0

Documentation

Node package for validating and normalizing IPA

Node.js CI CodeQL install size

Installing

Go grab it on npmjs via npm i @theresnotime/ipa-validator

You'll then be all set to require it:

const ipaValidator = require('@theresnotime/ipa-validator');

Simples!

IPA goes in..

We pass the the IPA as a string:

let validatorResult = await ipaValidator.validate('həˈləʊ');

..and a bool comes out!

Unsurprisingly, true for valid IPA, false for invalid IPA..

A note on "valid"

As the tests show, you need to use the correct unicode — for example, həˈləʊ is valid but hə'ləʊ is not.

Functions

validate

By default, the validate function strips delimiters (/.../, [...]) and checks that the string contains only valid IPA characters.

/**
 * Validate delimiter-stripped IPA, optionally normalising it first.
 * @param {string} ipa - IPA to validate.
 * @param {boolean} strip - Strip delimiters (default: true)
 * @param {boolean} normalizeIPA - Normalize IPA (default: false)
 * @param {boolean} google - Normalize IPA for Google TTS (default: false)
 * @returns {boolean} - Whether the IPA is valid.
 */
function validate(ipa, strip = true, normalizeIPA = false, google = false)

normalize

The normalize function ensures that the IPA is using the correct unicode for similar looking characters (e.g. that you're using ˈ instead of '). By default, it does not strip delimiters.

/**
 * Normalize IPA
 * @param {string} ipa - IPA to normalize.
 * @param {boolean} strip - Strip delimiters (default: false)
 * @param {boolean} google - Normalize IPA for Google TTS (default: false)
 * @returns {string} - normalized IPA
 */
function normalize(ipa, strip = false, google = false)

stripIPA

The stripIPA function strips delimiters (/.../, [...]) from the IPA.

/**
 * Strip IPA delimiters (currently /.../ and [...])
 * @param {string} ipa - IPA to strip.
 * @returns {string} - Stripped IPA
 */
function stripIPA(ipa)

removeDiacritics

The removeDiacritics function removes diacritics from the IPA.

/**
 * Remove diacritics
 * @param {string} ipa - IPA to modify.
 * @param {boolean} strip - Strip delimiters (default: false)
 * @returns {string} - modified IPA
 */
function removeDiacritics(ipa, strip = false)

"Google" option

As part of a work project, we're feeding IPA to Google's TTS engine — Google is a little opinionated about things like diacritics. For example, the IPA ˈɔːfɫ̩ would not render correctly in Google TTS. A custom charmap is used to normalize certain characters:

let charmap = [
    ['(', ''],
    [')', ''],
    ["'", 'ˈ'],
    [':', 'ː'],
    [',', 'ˌ'],
    ['ⁿ', 'n'], // 207F
    ['ʰ', 'h'], // 02B0
    ['ɫ', 'l'], // 026B
    ['ˡ', 'l'], // 02E1
    ['ʲ', 'j'], // 02B2
];

Doing Google-y normalizing is just a call like:

await ipaValidator.normalize('ˈɔːfɫ̩', true, true);
// Returns ˈɔːfl

Some further examples can be seen in google.test.js.

Developing

  1. Fork n' clone this repo
  2. Do a npm install
  3. Run npm test because who knows, maybe its already broken
  4. Hack!

The Regex

^[().a-z|æçðøħŋœǀ-ǃɐ-ɻɽɾʀ-ʄʈ-ʒʔʕʘʙʛ-ʝʟʡʢʰʲʷʼˀˈˌːˑ˞ˠˡˤ-˩̴̘̙̜̝̞̟̠̤̥̩̪̬̯̰̹̺̻̼̀́̂̃̄̆̈̊̋̌̏̽̚͜͡βθχ᷄᷅᷈‖‿ⁿⱱ]+$

I've also placed it at https://regex101.com/r/f2Qhuk if you think you can improve it... (please do!)