Check out Upstream on-demand 👉 Watch now!

doc2term
Release 0.1

A fast NLP tokenizer that detects tokens and remove duplications and punctuations

Keywords: tokenizer, NLP, punctuation, standarization, duplicate-detection, text-processing, text-tokenizing, doc2term
License: Apache-2.0
Install: pip install doc2term==0.1

Documentation

doc2term

A fast NLP tokenizer that detects sentences, words, numbers, urls, hostnames, emails, filenames, and phone numbers. Tokenize integrates and standardize the documents, remove the punctuations and duplications.

Installation

pip install doc2term

Compilation

The installation requires to compile the original C code using gcc.

Usage

Example notebook: doc2term

Example

>>> import doc2term

>>> doc2term.doc2term_str("Actions speak louder than words. ... ")
"Actions speak louder than words ."
>>> doc2term.doc2term_str("You can't judge a book by its cover. ... from thoughtcatalog.com")
"You can't judge a book by its cover . from thoughtcatalog.com"

Dependencies: 0
Dependent packages: 0
Dependent repositories: 0
Total releases: 2
Latest release: May 16, 2021
First release: May 16, 2021
Stars: 0
Forks: 0
Watchers: 1
Contributors: 1
Repository size: 0 Bytes
SourceRank: 6

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!
Package manager 2FA enabled: TEXT!

Releases

0.1: May 16, 2021
0.1.0: May 16, 2021

Contributors

See all contributors

Something wrong with this page? Make a suggestion

Export .ABOUT file for this package

Last synced: 2021-05-17 03:45:50 UTC

doc2term
Release 0.1

Release 0.1

0.1

0.1.0

Documentation

doc2term

Installation

Compilation

Usage

Example

Stats

Development practices

Releases

Contributors

doc2term Release 0.1

Release 0.1 Toggle Dropdown 0.1 0.1.0

Documentation

doc2term

Installation

Compilation

Usage

Example

Stats

Development practices

Releases

Contributors

doc2term
Release 0.1

Release 0.1

0.1

0.1.0