practNLPTools-lite
Creating practNLPTools in lite mode.[ get the old coding in devbranch or old stable code properbranch]
- on click this built this might take you to build of
practNLPTools which is testing ground for this repository so don’t
worry.
Practical Natural Language Processing Tools for Humans. practNLPTools is a pythonic library over SENNA and Stanford Dependency Extractor.
name | status |
---|---|
Wercker status | |
PyPi | |
travis | |
Documentation | |
dependency | |
blocker Pyupbot | |
FOSSA |
- Documentation: https://pntl.readthedocs.io
QuickStart
Downlarding Stanford Parser JAR
To downlard the stanford-parser from github automatically and placing them inside the install direction.
pntl -I true
# downlards required file from github.
Running Predefine Examples Sentences
To run exiting example for batch(which has more than one list of examples).
pntl -SE home/user/senna -B true
To run predefine example for one sentence.
pntl -SE home/user/senna
Running user given sentence
To run user given example using -S is
pntl -SE home/user/senna -S 'I am gonna make him an offer he can not refuse.'
Functionality
- Semantic Role Labeling.
- Syntactic Parsing.
- Part of Speech Tagging (POS Tagging).
- Named Entity Recognisation (NER).
- Dependency Parsing.
- Shallow Chunking.
- Skip-gram(in-case).
- find the senna path if is install in the system.
- stanford parser and depPaser file into installed direction.
Future work
- tag2file(new)
- creating depParser for corresponding os environment
- custome input format for stanford parser insted of tree format
Features
- Fast: SENNA is written is C. So it is Fast.
- We use only dependency Extractor Component of Stanford Parser, which takes in Syntactic Parse from SENNA and applies dependency Extraction. So there is no need to load parsing models for Stanford Parser, which takes time.
- Easy to use.
- Platform Supported - Windows, Linux and Mac
- Automatic finds stanford parsing jar if it is present in install path[pntl].
Note
SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should beconsidered and your system specific binary should be rebuilt. Otherwise this could introduce misalignment errors.
Installation
Requires:
A computer with 500mb memory, Java Runtime Environment (1.7 preferably, works with 1.6 too, but didnt test.) installed and python.
Linux:
run:
sudo python setup.py install
windows:
run this commands as administrator:
python setup.py install
Bench Mark comparsion
By using the time
command in ubuntu on running the testsrl.py
on
this link and along with tools.py
on pntl
pntl | NLTK-senna | |
---|---|---|
at fist run | ||
real 0m1.674s | real 0m2.484s | |
user 0m1.564s | user 0m1.868s | |
sys 0m0.228s | sys 0m0.524s | |
at second run | ||
real 0m1.245s | real 0m3.359s | |
user 0m1.560s | user 0m2.016s | |
sys 0m0.152s | sys 0m1.168s |
Note
this bench mark may differt accouding to system’s working and to restult present here is exact same result in my system ububtu 4Gb RAM and i3 process. If I find another good benchmark techinque then I will change to it.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.