wiki-passage-retriever

A small tool for retrieving relevant passages to a question from Wikipedia


Keywords
annoy, approximate-nearest-neighbor-search, cli, click, colab-notebook, command-line-tool, flask-application, information-retrieval, pytorch, transformers
License
Apache-2.0
Install
pip install wiki-passage-retriever==0.1.3

Documentation

Wikipedia Passage Retriever

This package allows for retrieving Wikipedia passages (i.e paragraphs) relevant to a question.

Under the hood, it uses a dense passage retriever, with pretrained model from HuggingFace's transformers library.

Usage

To install the package (which comes with the command-line tool), run the following command in terminal:

  pip install wiki-passage-retriever

The easiest way to play with the package is to use the command line tool. For instance:

# Indexing a wikipedia page:
wikiretriever index -q "Nelson Mandela" -f nelsonindex

# Retrieve relevant passages from index:
wikiretriever indexed-retrieve -q "Who was Nelson Mandela?" -f nelsonindex -k 5

# Slow retrieval:
wikiretriever retrieve --query "Nelson Mandela" --question "Who was Nelson Mandela's father?" --topk 5

I also created a simple flask application to retrieve and display the results.

Colab Notebook Examples

TODO:

  • Add option to retrieve k best passages
  • (Maybe) retrieve individual sentences instead of paragraphs?
  • (When the bug is fixed) Switch to out-of-the-box Huggingface's tokenizer.
  • Add option to run model on GPU
  • Add option to search from different wikipedia articles (e.g first k results from search query)
  • Add option to control text truncation (for now, always use full text).
  • Extract span (instead of outputting entire text)
  • Validate inputs and edge cases
  • Protect database by transactions