The trans module
This module translates national characters into similar sounding latin characters (transliteration). At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian, Kazakh and Farsi alphabets are supported (it covers 99% of needs).
Contents
Simple usage
It's very easy to use
Python 3:
>>> from trans import trans >>> trans('ΠΡΠΈΠ²Π΅Ρ, ΠΠΈΡ!')
Python 2:
>>> import trans >>> u'ΠΡΠΈΠ²Π΅Ρ, ΠΠΈΡ!'.encode('trans') u'Privet, Mir!' >>> trans.trans(u'ΠΡΠΈΠ²Π΅Ρ, ΠΠΈΡ!') u'Privet, Mir!'
Work only with unicode strings
>>> 'Hello World!'.encode('trans') Traceback (most recent call last): ... TypeError: trans codec support only unicode string, <type 'str'> given.
This is readability
>>> s = u'''\ ... -- Π Π°ΡΠΊΡΠ΄ΡΠΈΡΡ ΡΠ²ΠΎΡ ΡΠ΅ΡΠ΅Π· ΠΊΠΎΡΠΎΠΌΡΡΠ»ΠΎ Π² Π±ΠΎΠ³Π° Π΄ΡΡΡ ΠΌΠ°ΡΡ ... ΡΡΠΈΡΡΠ° ΡΡΡΡΡ ΡΠ°Π· Π΅Π΄ΡΠ΅Π½Ρ Π²ΠΎΡΡ ΡΠ΅Π±Π΅ Π² ΠΊΡΡΠ»ΠΎ ... ΠΈ ΠΊΠ°ΠΊΡΡΡ Π² Π³Π»ΠΎΡΠΊΡ! -- Π²Π·ΡΠ΅Π²Π΅Π» ΡΠ°Π·ΡΡΡΠ΅Π½Π½ΡΠΉ ΠΠΈΠΊΠΎΠ΄ΠΈΠΌ. ... -- ΠΠΌΠΈΠ½Ρ, -- ΡΠΎΠ±ΠΊΠΎ Π΄ΠΎΠ±Π°Π²ΠΈΠ» ΠΈΠ· ΡΠΊΠ»Π΅ΠΏΠ° ΠΏΠ°ΠΏΠ° ΠΠΈΠΉ. ... (c) Π. Π. ΠΠ»Π΄ΠΈ, "Π‘ΠΊΠ°Π·ΠΊΠΈ Π΄Π΅Π΄ΡΡΠΊΠΈ Π²Π°ΠΌΠΏΠΈΡΠ°".''' >>> >>> print s.encode('trans') -- Raskudrit tvoyu cherez koromyslo v boga dushu mat trista tysyach raz edrenu vosh tebe v krylo i kaktus v glotku! -- vzrevel razyarennyy Nikodim. -- Amin, -- robko dobavil iz sklepa papa Piy. (c) G. L. Oldi, "Skazki dedushki vampira".
Table "slug"
Use the table "slug", leaving only the Latin characters, digits and underscores:
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans') 1 2 3 4 5 6 7 8 9 0 >>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/slug') 1_2_3_4_5__6_7_8_9_0 >>> s.encode('trans/slug')[-42:-1] u'_c__G__L__Oldi___Skazki_dedushki_vampira_'
Table "id"
Table id is deprecated and renamed to slug. Old name also available, but not recommended.
Define user tables
Simple variant
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my') Traceback (most recent call last): ... ValueError: Table "my" not found in tables! >>> trans.tables['my'] = {u'1': u'A', u'2': u'B'}; >>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my') u'A_B________________' >>>
A little harder
Table can consist of two parts - the map of diphthongs and the map of characters. Diphthongs are processed first by simple replacement in the substring. Then each character of the received string is replaced according to the map of characters. If character is absent in the map of characters, key None are checked. If key None is not present, the default character u'_' is used.
>>> diphthongs = {u'11': u'AA', u'22': u'BB'} >>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-', ... u'A': u'A', u'B': u'B'} # See below... >>> trans.tables['test'] = (diphthongs, characters) >>> u'11abc22cbaCC'.encode('trans/test') u'AAzyxBBxyz--'
The characters are created by processing of diphthongs also processed by the map of the symbols:
>>> diphthongs = {u'11': u'AA', u'22': u'BB'} >>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'} >>> trans.tables['test'] = (diphthongs, characters) >>> u'11abc22cbaCC'.encode('trans/test') u'--zyx--xyz--'
Without the diphthongs
These two tables are equivalent:
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'} >>> trans.tables['t1'] = characters >>> trans.tables['t2'] = ({}, characters) >>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2') True
ChangeLog
2.1 2016-09-19
- Add Farsi alphabet (thx rodgar-nvkz)
- Use pytest
- Some code style refactoring
2.0 2013-04-01
- Python 3 support
- class Trans for create different tables spaces
1.5 2012-09-12
- Add support of kazakh alphabet.
1.4 2011-11-29
- Change license to BSD.
1.3 2010-05-18
- Table "id" renamed to "slug". Old name also available.
- Some speed optimizations (thx to AndyLegkiy <andy.legkiy at gmail.com>).
1.2 2010-01-10
- First public release.
- Translate documentation to English.
Finally
-
- Special thanks to Yuri Yurevich aka j2a for the kick in the right direction.
-
I ask forgiveness for my bad English. I promise to be corrected.