spacy-alignments: Align tokenizations for spaCy + transformers
A spaCy package for Yohei Tamura's Rust tokenizations library with Python bindings.
Installation
pip install -U pip setuptools wheel
pip install spacy-alignments
If no binary wheel is available for your platform, you will need to install
Rust in order to build
spacy-alignments
from source.
spacy-alignments vs. pytokenizations
The spacy_alignments
module is a drop-in replacement for tokenizations
:
import spacy_alignments as tokenizations
a2b, b2a = tokenizations.get_alignments(["Ã¥", "BC"], ["abc"])
assert a2b == [[0], [0]]
assert b2a == [[0, 1]]
The only difference between this package and the original
pytokenizations
is that it
switches the build system to setuptools-rust
to make it easier for us at
Explosion to build source and binary packages for a wider range of platforms.
Bug reports and other issues
Please use spaCy's issue tracker to report a bug, or open a new thread on the discussion board for any other issue.