FastServe

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.

YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

Installation

Stable:

pip install FastServeAI

Latest:

pip install git+https://github.com/aniketmaurya/fastserve.git@main

Run locally

python -m fastserve

Usage/Examples

Serve LLMs with Llama-cpp

from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()

or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf from terminal.

Serve vLLM

from fastserve.models import ServeVLLM

app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()

You can use the FastServe client that will automatically apply chat template for you -

from fastserve.client import vLLMClient
from rich import print

client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])

Serve SDXL Turbo

from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1 from terminal.

This application comes with an UI. You can access it at http://localhost:8000/ui .

Face Detection

from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1 from terminal.

Image Classification

from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()

or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1 from terminal.

Serve Custom Model

To serve a custom model, you will have to implement handle method for FastServe that processes a batch of inputs and returns the response as a list.

from fastserve import FastServe


class MyModelServing(FastServe):
    def __init__(self):
        super().__init__(batch_size=2, timeout=0.1)
        self.model = create_model(...)

    def handle(self, batch: List[BaseRequest]) -> List[float]:
        inputs = [b.request for b in batch]
        response = self.model(inputs)
        return response


app = MyModelServing()
app.run_server()

You can run the above script in terminal, and it will launch a FastAPI server for your custom model.

Deploy

Lightning AI Studio ⚡️

python fastserve.deploy.lightning --filename main.py \
    --user LIGHTNING_USERNAME \
    --teamspace LIGHTNING_TEAMSPACE \
    --machine "CPU"  # T4, A10G or A10G_X_4

Contribute

Install in editable mode:

git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .

Create a new branch

git checkout -b ＜new-branch＞

Make your changes, commit and create a PR.

FastServeAI
Release 0.0.3

Release 0.0.3

0.0.3

0.0.2

0.0.3a0

Documentation

FastServe

Installation

Run locally

Usage/Examples

Serve LLMs with Llama-cpp

Serve vLLM

Serve SDXL Turbo

Face Detection

Image Classification

Serve Custom Model

Deploy

Lightning AI Studio ⚡️

Contribute

Stats

Development practices

Releases

Contributors

FastServeAI Release 0.0.3

Release 0.0.3 Toggle Dropdown 0.0.3 0.0.2 0.0.3a0

Documentation

FastServe

Installation

Run locally

Usage/Examples

Serve LLMs with Llama-cpp

Serve vLLM

Serve SDXL Turbo

Face Detection

Image Classification

Serve Custom Model

Deploy

Lightning AI Studio ⚡️

Contribute

Stats

Development practices

Releases

Contributors

FastServeAI
Release 0.0.3

Release 0.0.3

0.0.3

0.0.2

0.0.3a0