informationminer

Automatically performs NLP techniques. Currently supports German and English language.

The following techniques are used on the passed text:

Tokenization
POS Tagging
- English tagger is based on NLTK default
- German tagger is generated from TIGER corpus
Chunking
Named Entity recognition

Install

This package is on pip ! Just use pip3 install informationminer.

Getting started

Look at the following example. More complex tasks like creating your own Tagger will be added later.

>>> import informationminer
>>> im = informationminer.InformationMiner("This is a samelp sentence. I love InformationMiner !")
>>> im.process()
INFO:root:Start processing text
INFO:root:Tokenizing text
INFO:root:Creating new tokens
INFO:root:Writing output/01_token_output.json
INFO:root:POS tagging tokens
INFO:root:Creating new POS tags. This can take some time ...
INFO:root:Writing output/02_pos_output.json
INFO:root:Chunking POS
INFO:root:Creating new chunks. This can take some time ...
INFO:root:Writing output/03_chunk_output.pickle
INFO:root:Extracting entity names
INFO:root:Searching for named entities
INFO:root:Writing output/04_ne_output.json
INFO:root:Processing finished in 0.24 s
>>> im.tokens
['This', 'is', 'a', 'sample', 'sentence', '.', 'I', 'love', 'InformationMiner', '!']
>>> im.pos
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('sample', 'JJ'), ('sentence', 'NN'), ('.', '.'), ('I', 'PRP'), ('love', 'VBP'), ('InformationMiner', 'NN'), ('!', '.')]
>>> im.ne
['InformationMiner']

The InformationMiner class has a couple of optional parameters:

save_output: Write output to outdir/outfile. Enable this, so you don't do work twice.
force_create: Will allways overwrite files if save_output is enabled
language: English by default. Currently either ger or en

Common Hickups

Please propose design changes to fix those if you have a great idea ! :)

Existing files in the output directory are always used, ignoring given text
save_output / force_create have some strange interaction

informationminer
Release 1.6.5

Release 1.6.5

1.6.8

1.6.7

1.6.6

1.6.5

1.6.4

1.6.3

1.6.2

1.6.1

1.6

1.5

Documentation

informationminer

Install

Getting started

Common Hickups

Stats

Development practices

Releases

Contributors

informationminer Release 1.6.5

Release 1.6.5 Toggle Dropdown 1.6.8 1.6.7 1.6.6 1.6.5 1.6.4 1.6.3 1.6.2 1.6.1 1.6 1.5

Documentation

informationminer

Install

Getting started

Common Hickups

Stats

Development practices

Releases

Contributors

informationminer
Release 1.6.5

Release 1.6.5

1.6.8

1.6.7

1.6.6

1.6.5

1.6.4

1.6.3

1.6.2

1.6.1

1.6

1.5