Python SDK for the Kodexa Platform


Keywords
cloud, unstructured-data
License
Apache-2.0
Install
pip install kodexa==7.0.8709636589

Documentation

Kodexa

Build and Package with Poetry

img.png

Kodexa is a platform for building intelligent document processing pipelines. It is a set of tools and services that allow you to build a pipeline that can take a document, extract the content, and then process it to extract the information you need.

It is built on a set of core principles:

  • Document Centric - Kodexa is built around the idea of a document. A document is a collection of content nodes that are connected together. This is a powerful model that allows you to build pipelines that can extract content from a wide range of sources.

  • Pipeline Oriented - Kodexa is built around the idea of a pipeline. A pipeline is a series of steps that can be executed on a document. This allows you to build a pipeline that can extract content from a wide range of sources.

  • Extensible - Kodexa is built around the idea of a pipeline. A pipeline is a series of steps that can be executed on a document. This allows you to build a pipeline that can extract content from a wide range of sources.

  • Label Driven - Kodexa focuses on the idea of labels. Labels are a way to identify content within a document and then use that content to drive the processing of the document.

Python SDK

This repository contains the Python SDK for Kodexa. The SDK is the primary way to interact with Kodexa. It allows you to define actions, models, and pipelines that can be executed on Kodexa. It also includes a complete SDK client for working with a Kodexa platform instance.

Documentation & Examples

Documentation is available at the Kodexa Documentation Portal

Set-up

We use poetry to manage our dependencies, so you can install them with:

poetry install

You can then run the tests with:

poetry run pytest

Contributing

We welcome contributions to the Kodexa platform. Please see our contributing guide for more details.

License

Apache 2.0