unistring

Unicode utf-8 functions


License
Unlicense

Documentation

unistring

Why

Starting with Erlang R16B, string module in standard library handles Unicode utf-8 strings very well, except that string:to_upper/1 and string:to_lower/1 ignore unicode code points > 255.

Unistring library uses Lowercase Mapping, Uppercase Mapping and Titlecase Mapping columns hexadecimal values from UnicodeData.txt.

License

Public Domain

API

Unicode utf-8 functions for Erlang:

unistring:to_lower/1
unistring:to_upper/1
unistring:to_title/1

Input to functions can be in charlist() or unicode_binary() format.

charlist() is a list of unicode code points, such as: [50504,45397,54616,49464,50836] means "안녕하세요". unicode_binary() is a binary string with utf8 encoding, such as: <<236,149,136,235,133,149,237,149,152,236,132,184,236,154,148>> means "안녕하세요".

charlist() and unicode_binary() can be converted from one to another using the unicode module in Erlang R16B or later.

Examples

Make sure to start your erl shell with erl +pc unicode Make sure to add %% -*- coding: utf-8 -*- to the beginning of your *.erl source code files if you will be using unicode characters inside of quotes. If you are creating unicode binary strings, make sure to append /utf8 after the string's end quote mark inside of <<>> such as: ```<<"안녕하세요"/utf8>>.

%% unistring:to_lower/1

"test" = unistring:to_lower("TEST").
<<"test">> = unistring:to_lower(<<"TEST">>).
"привет" = unistring:to_lower("ПРИВЕТ").
<<"привет"/utf8>> = unistring:to_lower(<<"ПРИВЕТ"/utf8>>).

%% unistring:to_upper/1
"TEST" = unistring:to_upper("test").
<<"TEST">> = unistring:to_upper(<<"test">>).
"ПРИВЕТ" = unistring:to_upper("привет").
<<"ПРИВЕТ"/utf8>> = unistring:to_upper(<<"привет"/utf8>>).

Note

I don't really see the point of unistring:to_title/1 but it is included in this library for completeness.