unicodeblocks – Character blocks defined in Unicode
This module supplements unicodedata
standard library module with ability
to lookup and work with Unicode blocks.
API
unicodeblocks.version
Version of module.
unicodeblocks.unidata_version
The version of Unicode database used in this module.
unicodeblocks.Block(name, start, end)
unicodeblocks.Block.name
Normalized name of block.
unicodeblocks.Block.start
The first codepoint mapped by block. Inclusive.
unicodeblocks.Block.end
The last codepoint mapped by block. Inclusive.
unicodeblocks.Block.contains(self, chr)
Checks either character is in this block.
unicodeblocks.Block.len(self):
Count of codepoints mapped by Block.
unicodeblocks.Block.lt(self, other):
Checks if both other.start and other.end are lower than self.start and self.end.
unicodeblocks.Block.gt(self, other):
Checks if both other.start and other.end are greater than self.start and self.end.
unicodeblocks.Block.eq(self, other):
Checks if both other.start equals to self.start and other.end equals to self.end.
unicodeblocks.blockof(chr)
Will return a Block
which maps the codepoint of chr or None
in case not
block maps the codepoint.
unicodeblocks.blocks
A dictionary-like collection of all blocks defined by Unicode.
unicodeblocks.blocks.names()
Returns a list of names of blocks in dictionary. Use this instead of .keys() if you want names presentable to user.
Some use cases
Find block a character belongs to
>>> unicodeblocks.blockof('-')
Block('Basic Latin', 0x0, 0x7f)
>>> unicodeblocks.blockof('か')
Block('Hiragana', 0x3040, 0x309f)
>>> unicodeblocks.blockof('日')
Block('CJK Unified Ideographs', 0x4e00, 0x9fff)
Number of codepoints defined in Unicode
>>> len(list(itertools.chain(*unicodeblocks.blocks.values())))
256336
Notes
Module doesn't check if codepoints within block are assigned.
For example see \u38D
. If you care about that, you should
try to obtain their name with unicodedata
module.