py-lex

UNKNOWN


License
Other
Install
pip install py-lex==0.2.1

Documentation

py-lex

Python libraries for parsing and applying emotion, socio-psycho, and sentiment datasets to your own text corpora.

Features

Analyzes at the unigram token of tokenized documents.

Supports datasets of the form of LIWC, NRC EmoLex, and NRC SentiLex. This library itself doesn't include any of these datasets you must procure or create them on your own.

Legal note: using the actual LIWC dataset outside of the application may be a violation of their terms of service. NRC datasets are free to use for research purposes, but require permission for commercial uses.

Requirements

Py-Lex has two dependencies marisa-trie and bidict, to install them simply run:

pip install -r ./requirements.txt

Usage

from py_lex import Liwc, EmoLex

# Instantiate reader from raw, local .dic file
lexicon = Liwc('./liwc.dic')
# Or use an NRC EmoLex lexicon, which implements the same API
# lexicon = EmoLex('./emo-lex.txt')

# Or from a previously dumped file for slightly faster instantiation (no parsing)
lexicon.dump('./lexicon.pickle')
lexiconP = Liwc()
lexiconP.load('./lexicon.pickle')

# tokenize your document of choice
document = nltk.tokenize.casual.casual_tokenize(a_str_document)
# => List[str]

summary = lexicon.summarize_doc(document)
# => Dict[str, Union[int, float]]
# Where str is the LIWC/EmoLex key

annotation = lexicon.annotate_doc(document)
# => List[Set[str]]
# Where each index aligns with the input List of words and str is the
# LIWC/EmoLex key for easy zipping.

# faster if you need both, since summarize_doc creates an annotation internally
summary = lexicon.summarize_annotation(annotation)
# => Dict[str, Union[int, float]]
# Where str is the LIWC/EmoLex key

# Get the set of keys being used by the given lexicon
lexicon.keys()
# => Set[str]

# Get the number of keys that the given lexicon will compute
len(lexicon)
# => int # 82