Wikipedia Passage Retriever
This package allows for retrieving Wikipedia passages (i.e paragraphs) relevant to a question.
Under the hood, it uses a dense passage retriever, with pretrained model from HuggingFace's transformers library.
Usage
To install the package (which comes with the command-line tool), run the following command in terminal:
pip install wiki-passage-retriever
The easiest way to play with the package is to use the command line tool. For instance:
# Indexing a wikipedia page:
wikiretriever index -q "Nelson Mandela" -f nelsonindex
# Retrieve relevant passages from index:
wikiretriever indexed-retrieve -q "Who was Nelson Mandela?" -f nelsonindex -k 5
# Slow retrieval:
wikiretriever retrieve --query "Nelson Mandela" --question "Who was Nelson Mandela's father?" --topk 5
I also created a simple flask application to retrieve and display the results.
TODO:
Add option to retrieve k best passages- (Maybe) retrieve individual sentences instead of paragraphs?
- (When the bug is fixed) Switch to out-of-the-box Huggingface's tokenizer.
- Add option to run model on GPU
- Add option to search from different wikipedia articles (e.g first k results from search query)
- Add option to control text truncation (for now, always use full text).
- Extract span (instead of outputting entire text)
- Validate inputs and edge cases
- Protect database by transactions