Wikipedia Passage Retriever

This package allows for retrieving Wikipedia passages (i.e paragraphs) relevant to a question.

Under the hood, it uses a dense passage retriever, with pretrained model from HuggingFace's transformers library.

Usage

To install the package (which comes with the command-line tool), run the following command in terminal:

  pip install wiki-passage-retriever

The easiest way to play with the package is to use the command line tool. For instance:

# Indexing a wikipedia page:
wikiretriever index -q "Nelson Mandela" -f nelsonindex

# Retrieve relevant passages from index:
wikiretriever indexed-retrieve -q "Who was Nelson Mandela?" -f nelsonindex -k 5

# Slow retrieval:
wikiretriever retrieve --query "Nelson Mandela" --question "Who was Nelson Mandela's father?" --topk 5

I also created a simple flask application to retrieve and display the results.

Colab Notebook Examples

TODO:

~~Add option to retrieve k best passages~~
(Maybe) retrieve individual sentences instead of paragraphs?
(When the bug is fixed) Switch to out-of-the-box Huggingface's tokenizer.
Add option to run model on GPU
Add option to search from different wikipedia articles (e.g first k results from search query)
Add option to control text truncation (for now, always use full text).
Extract span (instead of outputting entire text)
Validate inputs and edge cases
Protect database by transactions

wiki-passage-retriever
Release 0.1.3

Release 0.1.3

0.1.5a2

0.1.5a1

0.1.5a0

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

Documentation

Wikipedia Passage Retriever

Usage

TODO:

Stats

Development practices

Releases

Contributors

wiki-passage-retriever Release 0.1.3

Release 0.1.3 Toggle Dropdown 0.1.5a2 0.1.5a1 0.1.5a0 0.1.4 0.1.3 0.1.2 0.1.1 0.1.0

Documentation

Wikipedia Passage Retriever

Usage

TODO:

Stats

Development practices

Releases

Contributors

wiki-passage-retriever
Release 0.1.3

Release 0.1.3

0.1.5a2

0.1.5a1

0.1.5a0

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0