loquax

A Classical Phonology framework


Keywords
digital-humanities, functional-programming, history, linguistics, nlp, nlp-library, nlp-parsing, phonological-features, phonology
License
GPL-3.0
Install
pip install loquax==0.1.5

Documentation

loquax

A Classical Phonology framework

Code style: black Build status

Loquax, (Latin for "chatty"), is an extensible Python library for phonological processing. With hobbyists and academia in mind, it provides functionality for:

...

...

Its a zero-dependency framework, with functional style Python 3.10+ features to revive the sounds of the past, one phoneme at a time. To see Loquax in action without diving into code, visit Loquax Latin Online

Quickstart

pip install loquax
from loquax import Document
from loquax.languages import Latin

catilinarian_orations = Document("Quo奴sque tandem abut膿re, Catil墨na, patienti膩 nostr膩?", Latin)
print(catilinarian_orations.to_string(ipa=True, scansion=True))

# outputs:
# k史蓴.u藧s.k史蓻    tan.d蓻m    a.b蕣.te藧.r蓻    ka.t瑟.li藧.na    pa.t瑟.蓻n.t瑟.a藧    n蓴s.tra藧
#  u   -   u      -   u     u u   -  u     u  u   -  u     u  u  u  u  -      u   -

Syllabification and Tokenization

print(catilinarian_orations.tokens)

# outputs:
# [k史蓴.u藧s.k史蓻, tan.d蓻m, a.b蕣.te藧.r蓻, ka.t瑟.li藧.na, pa.t瑟.蓻n.t瑟.a藧, n蓴s.tra藧]

print(catilinarian_orations.tokens[0].syllables)

# outputs:
# [quo, 奴s, que]

Phoneme Analysis

Understand unique sounds and their roles within words relative to a Language

from loquax.abstractions import Phoneme
from loquax.languages import Latin

r = Phoneme('r', Latin)
print(r.is_consonant and r.is_liquid)  # outputs: True

Morphological Transformations

The central problem of phonology is that linguistic units have changing features depending on their context and neighbours.

Loquax allows users to tackle this by defining their own morphisms.

'''
In this example, we create a `Morphism` that targets syllables with a nucleus and at least one coda, 
then transforms them into long syllables. The transformation is only applied if the next syllable 
has an onset of length greater than or equal to one. 
'''

from loquax.morphisms import Morphism, Rule, RuleSequence
from loquax.syllables import Syllable
from dataclasses import replace

long_position_morphism = Morphism[Syllable](
    target=Rule[Syllable](check_fn=lambda s: s.nucleus and s.coda and len(s.coda) >= 1),
    transformation=lambda s: replace(s, is_long=True),
    suffix=RuleSequence(
        [Rule[Syllable](check_fn=lambda s: s.coda and len(s.onset) >= 1)]
    ),
)

MorphismStore lets you organize your morphisms and to apply all transformations in your MorphismStore to a given syllable or phoneme sequence:

from loquax.abstractions import MorphismStore

# Assuming morphism1, morphism2, morphism3 are predefined Morphism objects...
morphism_store = MorphismStore([morphism1, morphism2, morphism3])

syllables_sequence = [syllable1, syllable2, syllable3]

# Apply all transformations stored in MorphismStore
transformed_sequence = morphism_store.apply_all(syllables_sequence)

# transformed_sequence now holds the syllables transformed by morphism1, morphism2, morphism3 in order.

IPA Transliteration

To convert text into the International Phonetic Alphabet for universal comprehension, you can use the to_string function with ipa=True:

print(catilinarian_orations.to_string(ipa=True))

# outputs:
# k史蓴.u藧s.k史蓻    tan.d蓻m    a.b蕣.te藧.r蓻    ka.t瑟.li藧.na    pa.t瑟.蓻n.t瑟.a藧    n蓴s.tra藧

Scansion

Scansion is the process of marking the stresses in a poem, and dividing the lines into feet. It's a critical part of the study and enjoyment of classical verse, like in Latin and Ancient Greek poetry. Loquax makes it easy to integrate scansion into your language analysis pipeline.

Currently only differentiation between long and short syllables is made

print(catilinarian_orations.to_string(scansion=True))

# outputs:
# quo.奴s.que    tan.dem    a.bu.t膿.re    ca.ti.l墨.na    pa.ti.en.ti.膩    nos.tr膩
#  u  -   u      -   u     u u  -  u     u  u  -  u     u  u  u  u  -     u   -

Extensibility

Loquax allows for extensibility, so you can build and customize your own language rules for unique or theoretical languages. Here's an example of how to define custom rules and apply them:

# Create your own custom language with unique rules and phonemes
from loquax.languages import Latin
from loquax.abstractions import (
    PhonemeSyllabificationRuleStore, Language, 
    Constants, Tokenizer, MorphismStore, 
    Syllable, Morphism, Phoneme
)

# Let's suppose we have defined custom syllabification rules and constants
syllabification_rules = PhonemeSyllabificationRuleStore(...)
constants = Constants(...)
tokenizer = Tokenizer(...)
syllable_morphisms = MorphismStore[Syllable]([...])
phoneme_morphisms = MorphismStore[Phoneme]([...])

# Creation of our language object we can instantiate new `Documents` and other abstractions with
my_lang = Language(
    language_name='MyLang',
    iso_639_code='myl', # Made-up ISO 639 code for our custom language
    constants,
    syllabification_rules,
    syllable_morphisms,
    phoneme_morphisms,
    tokenizer,
)