Split tokens into word pieces


Keywords
word, tokenization, piece, wordpiece, rust
Licenses
MIT/Apache-2.0

Documentation

wordpieces

This crate provides a subword tokenizer. A subword tokenizer splits a token into several pieces, so-called word pieces. Word pieces were popularized by and used in the BERT natural language encoder.