
Pebblo Gen-AI Data Analyzer

langchain, ai, rag, data-governance, entity-classification, gen-ai, llm, topic-classification
pip install pebblo-sample==0.1.10


GitHub MIT license Documentation

PyPI PyPI - Downloads PyPI - Python Version

Discord Twitter Follow

Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.

Pebblo has two components.

  1. Pebblo Server - a REST api application with topic-classifier, entity-classifier and reporting features
  2. Pebblo Safe DataLoader - a thin wrapper to Gen-AI framework's data loaders

Pebblo Server


Using pip

pip install pebblo --extra-index-url

Download python package

Alternatively, download and install the latest Pebblo python .whl package from URL


curl -LO "" 
pip install pebblo-0.1.13-py3-none-any.whl

Run Pebblo Server


Pebblo Server now listens to localhost:8000 to accept Gen-AI application data snippets for inspection and reporting.

Pebblo Optional Flags
  • --config <file>: specify a configuration file in yaml format.

See configuration guide for knobs to control Pebblo Server behavior like enabling snippet anonymization, selecting specific report renderer, etc.

Using Docker

docker run -p 8000:8000

Local UI can be accessed by pointing the browser to https://localhost:8000.

See installation guide for details on how to pass custom config.yaml and accessing PDF reports in the host machine.


Refer to troubleshooting guide.

Pebblo Safe DataLoader


Pebblo Safe DataLoader is natively supported in Langchain framework. It is available in Langchain versions >=0.1.7

Enable Pebblo in Langchain Application

Add PebbloSafeLoader wrapper to the existing Langchain document loader(s) used in the RAG application. PebbloSafeLoader is interface compatible with Langchain BaseLoader. The application can continue to use load() and lazy_load() methods as it would on an Langchain document loader.

Here is the snippet of Lanchain RAG application using CSVLoader before enabling PebbloSafeLoader.

    from langchain.document_loaders.csv_loader import CSVLoader

    loader = CSVLoader(file_path)
    documents = loader.load()
    vectordb = Chroma.from_documents(documents, OpenAIEmbeddings())

The Pebblo SafeLoader can be enabled with few lines of code change to the above snippet.

    from langchain.document_loaders.csv_loader import CSVLoader
    from langchain_community.document_loaders.pebblo import PebbloSafeLoader

    loader = PebbloSafeLoader(
                name="acme-corp-rag-1", # App name (Mandatory)
                owner="Joe Smith", # Owner (Optional)
                description="Support productivity RAG application", # Description (Optional)
    documents = loader.load()
    vectordb = Chroma.from_documents(documents, OpenAIEmbeddings())

See here for samples with Pebblo enabled RAG applications and this document for more details.


Pebblo is a open-source community project. If you want to contribute see Contributor Guidelines for more details.


Pebblo is released under the MIT License