docuscospacy: Support for spaCy models trained on DocuScope and the CLAWS7 tagset

The docuscospacy package contains a set of functions to facilitate the processing of tagged corpora using:

en_docusco_spacy -- a spaCy model trained on the CLAWS7 tagset and DocuScope; and
tmtoolkit -- a set of tools for text mining and topic modeling

The documentation for docuscospacy is available on docuscospacy.readthedocs.org and the GitHub code repository is on github.com/browndw/docuscospacy.

Requirements and installation

docuscospacy works with Python 3.8 or newer (tested up to Python 3.10). It also requires spacy >= 3.3.

The recommended way of installing docuscospacy is to:

create and activate a Python Virtual Environment ("venv")
install spacy and tmtoolkit with a recommended set of dependencies
download the en_docusco_spacy model
install docuscospacy

pip install docuscospacy

Features

Corpus analysis

The docuscospacy package supports the post-tagging generation of:

Outputs can be controlled either by part-of-speech or by DocuScope tag. Thus, can as noun and can as verb, for example, can be disambiguated.

Additionally, tagged multi-token sequences are aggregated for analysis. So, for example, where in spite of is tagged as a token sequence, it is combined into a single token.

Other features

KWIC tables that locate a node word in a center column with context columns on either side

Limits

the model that this package is designed for has only been trained on English
all data must reside in memory, i.e. no streaming of large data from the hard disk (which for example Gensim supports)

License

Code licensed under Apache License 2.0. See LICENSE file.

docuscospacy
Release 0.2.3

Release 0.2.3

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.9

0.1.8

0.1.7

0.1.6

0.1.5

Documentation

docuscospacy: Support for spaCy models trained on DocuScope and the CLAWS7 tagset

Requirements and installation

Features

Corpus analysis

Other features

Limits

License

Stats

Development practices

Releases

Contributors

docuscospacy Release 0.2.3

Release 0.2.3 Toggle Dropdown 0.2.4 0.2.3 0.2.2 0.2.1 0.2.0 0.1.9 0.1.8 0.1.7 0.1.6 0.1.5

Documentation

docuscospacy: Support for spaCy models trained on DocuScope and the CLAWS7 tagset

Requirements and installation

Features

Corpus analysis

Other features

Limits

License

Stats

Development practices

Releases

Contributors

docuscospacy
Release 0.2.3

Release 0.2.3

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.9

0.1.8

0.1.7

0.1.6

0.1.5