segmentation

Unicode text segmentation tr29


Keywords
unicode, text-segmentation, nim, word-break
License
MIT
Install
nimble install segmentation

Documentation

Segmentation

licence

An implementation of Unicode Text Segmentation (tr29). The splitting is made through a fast DFA.

See nim-graphemes for grapheme cluster segmentation

Install

nimble install segmentation

Compatibility

Nim 0.19, 0.20, +1.0.4

Usage

import sequtils
import segmentation

assert toSeq("The (“brown”) fox can’t jump 32.3 feet, right?".words) ==
  @["The", " ", "(", "“", "brown", "”", ")", " ", "fox", " ",
    "can’t", " ", "jump", " ", "32.3", " ", "feet", ",", " ",
    "right", "?"]

Docs

Read the docs

Tests

nimble test

LICENSE

MIT