Wikipedia Passage Retriever
This package allows for retrieving Wikipedia passages (i.e paragraphs) relevant to a question.
To install the package (which comes with the command-line tool), run the following command in terminal:
pip install wiki-passage-retriever
The easiest way to play with the package is to use the command line tool. For instance:
# Indexing a wikipedia page: wikiretriever index -q "Nelson Mandela" -f nelsonindex # Retrieve relevant passages from index: wikiretriever indexed-retrieve -q "Who was Nelson Mandela?" -f nelsonindex -k 5 # Slow retrieval: wikiretriever retrieve --query "Nelson Mandela" --question "Who was Nelson Mandela's father?" --topk 5
I also created a simple flask application to retrieve and display the results.
Add option to retrieve k best passages
- (Maybe) retrieve individual sentences instead of paragraphs?
- (When the bug is fixed) Switch to out-of-the-box Huggingface's tokenizer.
- Add option to run model on GPU
- Add option to search from different wikipedia articles (e.g first k results from search query)
- Add option to control text truncation (for now, always use full text).
- Extract span (instead of outputting entire text)