ipa2unicode

Package for converting SIL IPA93 legacy font to unicode


Keywords
SIL, IPA93, unicode
License
MIT
Install
pip install ipa2unicode==1.3

Documentation

Overview

This package converts text encoded using the legacy SIL IPA93 font to unicode.

It contains one function, convert_to_unicode(), which relies on a dictionary mapping IPA93 glyph codes to their corresponding unicode code point(s). This is useful if, for example, you are working with a resource like the [MOSS Aphasia MAPPD dataset] (https://www.mappd.org/about.html).

The package also exposes the dictionary itself, sil_to_unicode_dict, in case it is more convenient to use that directly. Lastly, this package contains a list of all the unicode diacritics (ipa_diacritics_unicode), which may be useful for removing diacritics from the input in a post-processing step.

Notes

Usage

The following code snippet illustrates the usage of the function convert_to_unicode, which takes a string of SIL IPA93 glyph access codes and returns an equivalent unicode string. In this example, we assume the input excel file MAPPD.xlsx contains a structured data set in which the IPA93 data lives in a column called "Phonetic_response." We send each data point in this column to convert_to_unicode(), store the result in a new column called "New_phonetic_response," and write the new data set to a file called "MAPPD.new.xlsx":

import pandas as pd

mappd_df = pd.read_excel('MAPPD.xlsx')
# The input to convert_to_unicode() is a string so handle null values
# appropriately first.
mappd_df['New_phonetic_response'] = mappd_df.Phonetic_response.fillna('')
mappd_df.New_phonetic_response = mappd_df.New_phonetic_response.map(lambda x: convert_to_unicode(x))
mappd_df.to_excel('MAPPD.new.xlsx', index=False)