📚 Documentation : https://france-travail.github.io/happy_vllm/
happy_vLLM is a REST API for vLLM which was developed with production in mind. It adds some functionalities to vLLM.
You can install happy_vLLM using pip:
pip install happy_vllm
Or build it from source:
git clone https://github.com/France-Travail/happy_vllm.git
cd happy_vllm
pip install -e .
Just use the entrypoint happy-vllm
(see arguments for a list of all possible arguments)
happy-vllm --model path_to_model --host 127.0.0.1 --port 5000 --model-name my_model
It will launch the API and you can directly query it for example with
curl 127.0.0.1:5000/v1/info
To get various information on the application or
curl 127.0.0.1:5000/v1/completions -d '{"prompt": "Hey,", "model": "my_model"}'
if you want to generate your first LLM response using happy_vLLM. See endpoints for more details on all the endpoints provided by happy_vLLM.
A docker image is available from the Github Container Registry :
docker pull ghcr.io/france-travail/happy_vllm:latest
See deploying_with_docker for more details on how to serve happy_vLLM with docker.
You can reach the swagger UI at the /docs
endpoint (so for example by default at 127.0.0.1:5000/docs
). You will be displayed all the endpoints and examples on how to use them.