spacy_hunspell: Hunspell extension for spaCy
This package uses the spaCy 2.0 extensions to add Hunspell support for spellchecking. Inspired from this discussion here.
Usage
Add the spaCyHunspell to the spaCy pipeline.
import spacy
from spacy_hunspell import spaCyHunSpell
nlp = spacy.load('en_core_web_sm')
hunspell = spaCyHunSpell('mac')
nlp.add_pipe(hunspell)
doc = nlp('I can haz cheezeburger.')
haz = doc[2]
haz._.hunspell_spell # False
haz._.hunspell_suggest # ['ha', 'haze', 'hazy', 'has', 'hat', 'had', 'hag', 'ham', 'hap', 'hay', 'haw', 'ha z']
There are two default locations for Hunspell dictionaries for each platform
(mac
, and linux
). If there are not you can specify the two files manually.
hunspell = spaCyHunSpell('mac')
hunspell = spaCyHunSpell('linux')
hunspell = spaCyHunSpell('en_US.dic', 'en_US.aff')
You can find the English dictionary files here.
Installation
Installation is a little tricky for Hunspell. Make sure to have python-dev
and libhunspell-dev
installed
if on a Linux system. For Mac, brew install hunspell
.
Install both the Python bindings for Hunspell (pyhunspell
)
through pip install hunspell
.
For Mac, you may have to add a few steps before pip installing:
export C_INCLUDE_PATH=/usr/local/include/hunspell
ln -s /usr/local/lib/libhunspell-{VERSION_NUMBER}.a /usr/local/lib/libhunspell.a
For Mac 10.13 High Sierra, you may have to set the C flags (issue).
CFLAGS=$(pkg-config --cflags hunspell) LDFLAGS=$(pkg-config --libs hunspell) pip install hunspell
Install the rest of the requirements.
pip install -r requirements.txt
And download at least one spaCy model.
python -m spacy download en_core_web_sm