nlpTurk - Turkish NLP library

nlpTurk is an open source Turkish NLP library consisting of machine learning based sentence boundary detection, lemmatization and POS tagging models.

Installation & Usage

nlpTurk can be installed from PyPI.

pip install nlpturk

nlpTurk offers a simple API to extract sentences, lemmas and POS tags.

import nlpturk

text = "Sosyal medya hayatımıza hızlı girdi.ama yazım kurallarına dikkat eden pek yok :)"
doc = nlpturk(text)

# iterate over tokens
for token in doc:
    print(f"token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")

"""
Prints:
  token: Sosyal, lemma: sosyal, pos: ADJ
  token: medya, lemma: medya, pos: NOUN
  ...
"""

# or get tokens by token ids
token = doc[5]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")
token = doc[6]
print(f"token: {token.text}, sent_start: {token.is_sent_start}, sent_end: {token.is_sent_end}")

"""
Prints:
  token: ., sent_start: False, sent_end: True
  token: ama, sent_start: True, sent_end: False
"""

# iterate over sentences
for i, sent in enumerate(doc.sents):
    print(f"sentence #{i+1}: {sent.text}")
    for token in sent:
        print(f"  token: {token.text}, lemma: {token.lemma}, pos: {token.pos}")

"""
Prints:
  sentence #1: Sosyal medya hayatımıza hızlı girdi.
    token: Sosyal, lemma: sosyal, pos: ADJ
    ...
  sentence #2: ama yazım kurallarına dikkat eden pek yok :)
    token: ama, lemma: ama, pos: CCONJ
    ...
"""

Performance

The evaluation was performed on test dataset. Detailed evaluation and benchmarking results can be found here.

	accuracy	precision	recall	f1-score
Sentence Segmenter	-	98.09	96.05	97.06
POS Tagger	-	95.75	96.26	96.01
Lemmatizer	96.87	-	-	-

You can perform benchmarking on your own dataset.

git clone https://github.com/nlpturk/nlpturk.git
cd nlpturk
pip install -r requirements.txt
python -m nlpturk benchmark --data_path path/to/data --output_path path/to/output

nlpturk
Release 0.0.1

Release 0.0.1

0.0.2

0.0.1

0.0.0a0

Documentation

nlpTurk - Turkish NLP library

Installation & Usage

Performance

Stats

Development practices

Releases

Contributors

nlpturk Release 0.0.1

Release 0.0.1 Toggle Dropdown 0.0.2 0.0.1 0.0.0a0

Documentation

nlpTurk - Turkish NLP library

Installation & Usage

Performance

Stats

Development practices

Releases

Contributors

nlpturk
Release 0.0.1

Release 0.0.1

0.0.2

0.0.1

0.0.0a0