utf9

Encode and decode text using UTF-9.


Keywords
utf9, encode, decode, rfc4042
License
MIT
Install
pip install utf9==0.3.1

Documentation

utf9

Encode and decode text using UTF-9.

Description

On April 1st 2005, IEEE released the RFC4042UTF-9 and UTF-18 Efficient Transformation Formats of Unicode” :

The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet.

Since there are not so many architecture that use 9 bit nonets as natural storage units and the release date was on April Fools’ Day, the beautiful UTF-9 was forgotten and no python implementation is available.

This python module is here to fill this gap! ;)

Usage

There are only two functions:

  • utf9encode(string): takes a string and returns a utf9-encoded version.
  • utf9decode(data): takes utf9-encoded data and returns the corresponding string.

Example

>>> import utf9
>>> encoded = utf9.utf9encode(u'ႹЄLᒪo, 🌍ǃ')
>>> print repr(encoded)
'p\xe0\xb7-\x0c!1\xc3\x92\xd5\x1b\xc5\x82\x07n\x83x\xed\xdecX\xf80'
>>> print utf9.utf9decode(encoded)
ႹЄLᒪo, 🌍ǃ