Toolkit for using self-hosted large language models (LLMs), but also with support for full-service such as OpenAI's GPT models.

Includes demos with RAG ("chat your documents") and AGI/AutoGPT/privateGPT-style capabilities, via streamlit, Discord, command line, etc.

There are some helper functions for common LLM tasks, such as those provided by projects such as langchain, but not meant to be as extensive. The OgbujiPT approach emphasizes simplicity and transparency.

Tested back ends are llama.cpp (custom HTTP API), llama-cpp-python (OpenAI HTTP API), text-generation-webui (AKA Oobabooga or Ooba) and in-memory hosted LLaMA-class (and more) models via ctransformers. In our own practice we apply these with Nvidia and Apple M1/M2 GPU enabled.

We also test with OpenAI's full service GPT (3, 3.5, and 4) APIs, and apply these in our practice.

OgbujiPT is primarily developed by the crew at Oori Data. We offer data pipelines and software engineering services around AI/LLM applications.

Getting started

pip install ogbujipt

Just show me some code, dammit!

from ogbujipt.llm_wrapper import openai_chat_api, prompt_to_chat

llm_api = openai_chat_api(base_url='http://localhost:8000')  # Update for your LLM API host
prompt = 'Write a short birthday greeting for my star employee'

# You can set model params as needed
resp = llm_api.call(prompt_to_chat(prompt), temperature=0.1, max_tokens=256)
# Extract just the response text, but the entire structure is available
print(resp.first_choice_text)

The Nous-Hermes 13B LLM offered the following response:

Dear [Employee's Name], I hope this message finds you well on your special day! I wanted to take a moment to wish you a very happy birthday and express how much your contributions have meant to our team. Your dedication, hard work, and exceptional talent have been an inspiration to us all. On this occasion, I want you to know that you are appreciated and valued beyond measure. May your day be filled with joy and laughter.

Asynchronous by design

Above example shows the synchronous API, provided for dumb convenience, but for most use cases you'll want to use the asynchronous API. This example also adds a system message.

import asyncio
from ogbujipt.llm_wrapper import openai_chat_api, prompt_to_chat

llm_api = openai_chat_api(base_url='http://localhost:8000')  # Update for your LLM API host
prompt = 'Write a short birthday greeting for my star employee'

messages = prompt_to_chat(prompt, system='You are a helpful AI agent…')
resp = await asyncio.run(llm_api(messages, temperature=0.1, max_tokens=256))
# Extract just the response text, but the entire structure is available
print(resp.first_choice_text)

llama.cpp HTTP API for flexible LLM control

Here's an example using a model hosted directly by llama.cpp's server.

import asyncio
from ogbujipt.llm_wrapper import prompt_to_chat, llama_cpp_http_chat

llm_api = llama_cpp_http_chat('http://localhost:8000')
resp = asyncio.run(llm_api(prompt_to_chat('Knock knock!'), min_p=0.05))
print(resp.first_choice_text)

ctransformers for local in-process loaded LLMs

Here's an example using a model loaded in-process using ctransformers.

from ctransformers import AutoModelForCausalLM

from ogbujipt.llm_wrapper import ctransformer as ctrans_wrapper

model = AutoModelForCausalLM.from_pretrained('TheBloke_LlongOrca-13B-16K-GGUF',
        model_file='llongorca-13b-16k.Q5_K_M.gguf', model_type="llama", gpu_layers=50)
llm = ctrans_wrapper(model=model)

print(llm(prompt='Write a short birthday greeting for my star employee', max_new_tokens=100))

For more examples…

See the demo directory. Demos include:

Basics:
- Use of basic LLM text completion to correct a data format (XML)
- Multiple simultaneous LLM queries via multiprocessing
Chatbots/agents:
- Simple Discord bot
Advanced LLM API features:
- OpenAI-style function calling
Retrieval Augmented Generation (RAG):
- Ask LLM questions based on web site contents, on the command line
- Ask LLM questions based on uploaded PDF, via Streamlit interactive UI
- Use PostgreSQL/PGVector for extracting context which can be fed to LLMs

A bit more explanation

Many self-hosted AI large language models are now astonishingly good, even running on consumer-grade hardware, which provides an alternative for those of us who would rather not be sending all our data out over the network to the likes of ChatGPT & Bard. OgbujiPT provides a toolkit for using and experimenting with LLMs as loaded into memory via or via OpenAI API-compatible network servers such as:

llama-cpp-python
text-generation-webui (AKA Oobabooga or Ooba)

OgbujiPT can invoke these to complete prompted tasks on self-hosted LLMs. It can also be used for building front ends to ChatGPT and Bard, if these are suitable for you.

Right now OgbujiPT requires a bit of Python development on the user's part, but more general capabilities are coming.

Bias to sound software engineering

I've seen many projects taking stabs at something like this one, but they really just seem to be stabs, usually by folks interested in LLM who admit they don't have strong coding backgrounds. This not only leads to a lumpy patchwork of forks and variations, as people try to figure out the narrow, gnarly paths that cater to their own needs, but also hampers maintainability just at a time when everything seems to be changing drastically every few days.

I have a strong Python and software engineering background, and I'm looking to apply that in this project, to hopefully create something more easily speclailized for other needs, built-upon, maintained and contributed to.

This project is packaged using hatch, a modern Python packaging tool. I plan to write tests as I go along, and to incorporate continuous integration. Admit I may be slow to find the cycles for all that, but at least the intent and architecture is there from the beginning.

Prompting patterns

Different LLMs have different conventions you want to use in order to get high quality responses. If you've looked into self-hosted LLMs you might have heard of the likes of alpaca, vicuña or even airoboros. OgbujiPT includes some shallow tools in order to help construct prompts according to the particular conventions that would be best for your choice of LLM. This makes it easier to quickly launch experiments, adapt to and adopt other models.

Contributions

If you want to run the test suite, a quick recipe is as follows:

pip install ruff pytest pytest-mock pytest-asyncio respx pgvector asyncpg pytest-asyncio
pytest test

If you want to make contributions to the project, please read these notes.

Resources

Against mixing environment setup with code

License

Apache 2. For tha culture!

Credits

Some initial ideas & code were borrowed from these projects, but with heavy refactoring:

Related projects

mlx-tuning-fork —"very basic framework for parameterized Large Language Model (Q)LoRa fine-tuning with MLX. It uses mlx, mlx_lm, and OgbujiPT, and is based primarily on the excellent mlx-example libraries but adds very minimal architecture for systematic running of easily parameterized fine tunes, hyperparameter sweeping, declarative prompt construction, an equivalent of HF's train on completions, and other capabilities."
living-bookmarks—"Uses [OgbujiPT] to Help a user manage their bookmarks in context of various chat, etc."

FAQ

What's unique about this toolkit?
Does this support GPU for locally-hosted models
What's with the crazy name?

What's unique about this toolkit?

I mentioned the bias to software engineering, but what does this mean?

Emphasis on modularity, but seeking as much consistency as possible
Support for multitasking
Finding ways to apply automated testing

Does this support GPU for locally-hosted models

Yes, but you have to make sure you set up your back end LLM server (llama.cpp or text-generation-webui) with GPU, and properly configure the model you load into it.

Many install guides I've found for Mac, Linux and Windows touch on enabling GPU, but the ecosystem is still in its early days, and helpful resouces can feel scattered.

What's with the crazy name?

Enh?! Yo mama! 😝 My surname is Ogbuji, so it's a bit of a pun. This is the notorious OGPT, ya feel me?

OgbujiPT
Release 0.9.2

Release 0.9.2

0.9.6

0.9.5

0.9.4

0.9.3

0.9.2

0.9.1

0.9.0

0.8.0

0.7.1

0.7.0

Documentation

Quick links

Getting started

Just show me some code, dammit!

Asynchronous by design

llama.cpp HTTP API for flexible LLM control

ctransformers for local in-process loaded LLMs

For more examples…

A bit more explanation

Bias to sound software engineering

Prompting patterns

Contributions

Resources

License

Credits

Related projects

FAQ

What's unique about this toolkit?

Does this support GPU for locally-hosted models

What's with the crazy name?

Stats

Development practices

Releases

Contributors

OgbujiPT Release 0.9.2

Release 0.9.2 Toggle Dropdown 0.9.6 0.9.5 0.9.4 0.9.3 0.9.2 0.9.1 0.9.0 0.8.0 0.7.1 0.7.0

Documentation

Quick links

Getting started

Just show me some code, dammit!

Asynchronous by design

llama.cpp HTTP API for flexible LLM control

ctransformers for local in-process loaded LLMs

For more examples…

A bit more explanation

Bias to sound software engineering

Prompting patterns

Contributions

Resources

License

Credits

Related projects

FAQ

What's unique about this toolkit?

Does this support GPU for locally-hosted models

What's with the crazy name?

Stats

Development practices

Releases

Contributors

OgbujiPT
Release 0.9.2

Release 0.9.2

0.9.6

0.9.5

0.9.4

0.9.3

0.9.2

0.9.1

0.9.0

0.8.0

0.7.1

0.7.0