jsonllm

Tools for working with LLMs on JSON data

Usage | Installation | Why | How

Usage

Usage: jsonllm [OPTIONS] COMMAND [ARGS]...

  Tools for working with LLMs on JSON data

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  embed  Turn a JSON of content into a JSON of embeddings.

Usage: jsonllm embed [OPTIONS]

  Turn a JSON of content into a JSON of embeddings.

Options:
  -i, --input PATH  File to embed
  -m, --model TEXT  Embedding model(s) to use
                    
                    Issue `llm embed-models list` to list available models.
                    
                    Currently installed are: ['3-large', '3-large-1024',
                    '3-large-256', '3-small', '3-small-512', 'ada-002',
                    'clip', 'jina-embeddings-v2-base-en', 'jina-
                    embeddings-v2-large-en', 'jina-embeddings-v2-small-en',
                    'onnx-bge-base', 'onnx-bge-large', 'onnx-bge-micro',
                    'onnx-bge-small', 'onnx-gte-tiny', 'onnx-minilm-l12',
                    'onnx-minilm-l6', 'sentence-transformers/all-MiniLM-L6-v2']
                    
                    You can install more via `llm install ...`
                    
                    You can find available models here: https://llm.datasette.io/en/stable/plugins/directory.html#embedding-models
  -j, --jq TEXT     Embed only the keys that satisfy the given jq filter
                    expression
  --in-arrays       Embed text appearing in arrays too
  --help            Show this message and exit.

CREATE TABLE people (data JSONB);

python tests/gen_people.py 100 |\
jsonllm embed -m clip -j '.name'
psql -c "\COPY people(data) FROM stdin"

echo '{"hello": "world"}' | jsonllm embed -m clip

Installation

pip install jsonllm

Available Models

Available embedding models are those provided and installed via the llm package.

llm-sentence-transformers adds support for embeddings using the sentence-transformers library, which provides access to a wide range of embedding models.
llm-clip provides the CLIP model, which can be used to embed images and text in the same vector space, enabling text search against images. See Build an image search engine with llm-clip for more on this plugin.
llm-embed-jina provides Jina AI's 8K text embedding models.
llm-embed-onnx provides seven embedding models that can be executed using the ONNX model framework.

llm install llm-sentence-transformers
llm install llm-clip
llm install llm-embed-jina
llm install llm-embed-onnx

For an up-to-date list check here

Why

There are now plenty of tools providing ways of getting embeddings out of a corpus of text. Some even can generate embeddings from JSON documents, but they treat JSON as simple text too.

That is rarely the case though; JSON documents have structure and semantics depending on their application in context. Most importantly though it's data exchange format and a data aggregation tool. Aggregation in the sense of getting data from A to B.

In my case point A was a JSON object created by an SQL query from a Postgres database, piped through jsonllm and pushed into another Postgres instance specifically designed for AI-related experiments.

How

jsonllm traverses a JSON object recursively, and replaces text values with their embeddings array.

Other data types are not modified at all and the overall object structure is not changed.

Development

pip install -e '.[test]'
pytest

jsonllm
Release 0.1.0a2

Release 0.1.0a2

0.1.0a2

0.1.0a1

0.0.1

Documentation

jsonllm

Usage

Installation

Available Models

Why

How

Development

Stats

Development practices

Releases

Contributors

jsonllm Release 0.1.0a2

Release 0.1.0a2 Toggle Dropdown 0.1.0a2 0.1.0a1 0.0.1

Documentation

jsonllm

Usage

Installation

Available Models

Why

How

Development

Stats

Development practices

Releases

Contributors

jsonllm
Release 0.1.0a2

Release 0.1.0a2

0.1.0a2

0.1.0a1

0.0.1