keyword-ranker

Python implementation ranking keywords from a corpus with with respect to other text files using the Rapid Automatic Keyword Exctraction algorithm.


Keywords
nlp, text-mining, algorithms
License
MIT
Install
pip install keyword-ranker==0.2

Documentation

keyword_ranker

The Keyword Ranker uses the Rapid Automatic Keyword Extraction algorithm (RAKE) to extract the most relevant keywords from a reference corpus, and ranks them depending on the degree to which they are represented in other benchmark documents.

Setup

Using pip

pip install keyword_ranker

Directly from the repository

git clone https://github.com/sbadecker/keyword_ranker.git
python keyword_ranker/setup.py install

Dependencies

This package requires the modules NLTK and six and will install them if necessary. To use lemmatization, nltk.corpus.wordnet is required an will be downloaded if necessary.

Languages

This package comes with an English stopwords list. You can specify your own set of stopwords by adding the filepath as an argument (stopwords_path). As of now, lemmatization is only supported in for English documents.

Usage

from keyword_ranker.kwr import KeywordRanker

kr = KeywordRanker()

kr.fit() # Extracts and scores the keywords from the corpus.
# example: kr.fit(corpus.txt)

kr.rank() # Ranks the n highest scoring corpus keywords with regards to the provided documents.
# example: kr.rank(10, document1.txt, document2.txt)

References

This package uses a Python implementation of the RAKE algorithm as mentiones in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley. The original code included in rake.py can be found here: https://github.com/zelandiya/RAKE-tutorial. It has been extended by me to support lemmatization using WordNetLemmatizer from the NLTK.

Versions of python this code is tested against

  • 3.6

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

Pull requests are most welcome.