Retriever

Visually search and analyze your documents, entirely locally.

Install

Options:

Install with pip (Stable Release)

$ pip install retriever-search

Install from Github Repo (Latest Release)

$ git clone https://github.com/GovML/retriever.git
$ pip install -e .

We recommended using a virtual environment for all dependency installations. Before installing our repo, you can use venv to isolate the various packages installed in this environment to prevent conflicts with versions already installed on your computer.

$ python -m venv new_env
$ source new_env/bin/activate

Quickstart - Launching Retriever

Retriever is composed of two parts that you'll need to launch.

Backend: The backend server ingests and returns search results. This server is exposed locally via Flask.
Frontend: The frontend is the user interface (UI) you use to input searches and visualize your results. The frontend sends requests to the backend server.

First you'll need to ensure you have a folder of PDFs on your computer. If you don't have PDFs handy, we've provided a script under tutorials to download a few example papers from arXiv.

Once you have your folder of PDFs, you can start up the backend search server by opening up a terminal window and running:

>>> from retriever_search import search_server
>>> search_server.run_search_server('./pdfs_folder/', json_save_path='save_results.json', device='cpu')

If your computer has a CUDA compatible GPU, you can change device='cuda' or if you are on mac, device='mps'

Next, open up a second terminal window and run the following:

>>> from retriever_search import frontend_app as fp
>>> fp.run_frontend()

Retriever should be up and running! You can access the UI at the following port on your computer: http://127.0.0.1:7860. This URL would work for your local setup only (paste the url into your browser which can render the UI, but you won't need an external internet connection to use it).

Next time you run Retriever, you can call it directly on the json you just saved your pdfs to in order to save time! search_server.run_search_server(input_json='save_results.json', device='cpu')

Full Parameter Guide for Search server

>>> search_server.run_search_server('input_directory', 'input_json', 'json_save_path', 'embedding_model', 'qa_model', device='cpu')

Search parameter definitions

input_directory -- The directory holding your files, optional if input_json is passed instead
input_json -- pre saved json file from earlier runs can be used for faster loading, optional if input_directory is passed instead
json_save_path -- (optional) pass for saving the embeddings to a json can be used later as input_json
embedding_model -- (optional) pick the embedding model you want to we use Spectre model as a default
qa_model -- (optional) you can currently pick between tiny, medium and large
device -- (optional) can be set to cpu, mps or cuda

Tickets

1.1.0

Make LDA visualization update
QA Model Improvements
Add support for HTML, txt

1.0.2

Add Quickstart

localretriever
Release 0.0.2

Release 0.0.2

0.0.4

0.0.3

0.0.2

0.0.1

Documentation

Retriever

Visually search and analyze your documents, entirely locally.

Install

Quickstart - Launching Retriever

Full Parameter Guide for Search server

Search parameter definitions

Tickets

Stats

Development practices

Releases

Contributors

localretriever Release 0.0.2

Release 0.0.2 Toggle Dropdown 0.0.4 0.0.3 0.0.2 0.0.1

Documentation

Retriever

Visually search and analyze your documents, entirely locally.

Install

Quickstart - Launching Retriever

Full Parameter Guide for Search server

Search parameter definitions

Tickets

Stats

Development practices

Releases

Contributors

localretriever
Release 0.0.2

Release 0.0.2

0.0.4

0.0.3

0.0.2

0.0.1