Grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora.


Keywords
triples, svo, action, agency
License
Other
Install
pip install posextract==1.2.3

Documentation

posextract

posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See our article for more. You can also download posextract for pypi with pip.

Usage

  • extract_triples to extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triples
  • extract_adj_noun_pairs to extract adjective-noun pairs
  • extract_subj_verb_pairs to extract subject-verb pairs

Required Paramters:

  • input can be the name of a csv file or an input string
  • output name of the output file

Optional Paramters:

  • --data_column specify the column to extract triples from
  • --id_column specify a unique ID field if csv file is given
  • --lemma specify whether to lemmatize parts-of-speech
  • --post-combine-adj combine triples (adjective predicate with object)

Examples

Interactive:

Extract grammatical triples.

from posextract import extract_triples

triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])

for triple in triples:
    print(triple)

# Output: Landlords exercise oppression, soldiers were ill

Or extract adjectives and the nouns they modify.

from posextract import adj_noun_pairs

adj_noun = adj_noun_pairs.extract()

Or extract subjects and their verbs.

from posextract import subj_verb_pairs

subj_verb = subj_verb_pairs.extract()

Over CLI:

posextract can extract grammatical triples from text:

python -m posextract.extract_triples "Landlords may exercise oppression." output.csv

# Output: Landlords exercise oppression

posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:

python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally, soldiers were ill 
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj

# Output: soldiers were terminally ill

If provided a .csv file:

python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv

For More Information...

... see our Wiki: