spacy-spanish-lemmatizer
Spanish rule-based lemmatization for spaCy
Steps to use lemmatizer
-
Ensure you have installed spaCy following the instructions given at https://spacy.io/usage.
-
Ensure you have installed spaCy Spanish model following the instructions given at https://spacy.io/usage/models.
-
Install package via pip
pip install spacy_spanish_lemmatizer
-
Generate lemmatization rules (it may take several minutes): NOTE: currently, only lemmatization based on Wiktionary dump files is implemented. Due to licensing restrictions, the following command will download Wiktionary dump files and generate lemmatization rules based on them. By executing it, you are agreeing Wikimedia License.
python -m spacy_spanish_lemmatizer download wiki
-
Use it in Python:
import spacy
import spacy_spanish_lemmatizer
# Change "es" to the Spanish model installed in step 2
nlp = spacy.load("es")
nlp.replace_pipe("lemmatizer", "spanish_lemmatizer")
for token in nlp(
"""Con estos fines, la Direcci贸n de Gesti贸n y Control Financiero monitorea
la posici贸n de capital del Banco y utiliza los mecanismos para hacer un
eficiente manejo del capital."""
):
print(token.text, token.lemma_)
Con con
estos este
fines fin
, ,
la el
Direcci贸n direcci贸n
de de
Gesti贸n gesti贸n
y y
Control control
Financiero financiero
monitorea monitorea
la el
posici贸n posici贸n
de de
capital capital
del del
Banco banco
y y
utiliza utilizar
los el
mecanismos mecanismo
para para
hacer hacer
un un
eficiente eficiente
manejo manejo
del del
capital capital
. .