ipanema

Packaged language data from Wiktionary


License
MIT
Install
pip install ipanema==0.2.20190403

Documentation

/ipaˈnẽmɐ/

An attempt to create a central repository with structured language-related metadata for applications which need to work with a variety of different languages.

Data is aggregated from various sources and combined into a single SQLite database which can be queried easily.

The Wiktionary language code is defined as follows:

  1. If the language has a two-letter code in the ISO 639-1 standard, then that code is used.
  2. If the language has a three-letter code in the ISO 639-3 standard, then that code is used.
  3. If the language has a three-letter code in the ISO 639-2 standard, then that code is used. (rare)
  4. Any language which does not have an ISO code, but which is to be included in Wiktionary, has a new Wiktionary-specific "exceptional" code devised for it.

Data sources

Language data

Source Module:languages/data2, Module:languages/data3

$ make -j4 -f Makefile.lang-data data/lang_data.json

Language families

Source Module:families/data

$ make -f Makefile.lang-data data/lang_families.json

Native language names

Source Names.php

$ make -f Makefile.native

CLDR characters / relative time patterns

Source CLDR

$ make -f Makefile.cldr

IPA

Source Module:IPA/data/symbols, Wikipedia

$ redis-server
$ make -f Makefile.ipa

Links

Wikidata

CLDR (Common Locale Data Repository)