dataframes-haystack

Haystack custom components for your favourite dataframe library.


Keywords
ai, dataframe, haystack, llm, machine-learning, nlp, pandas, polars, python
License
Other
Install
pip install dataframes-haystack==0.0.1

Documentation

Dataframes Haystack

PyPI - Version PyPI - Python Version PyPI - License

Code style: black Ruff pre-commit.ci status


📃 Description

dataframes-haystack is an extension for Haystack 2 that enables integration with dataframe libraries.

The library offers custom Converters components that convert data stored in dataframes into Haystack Document objects.

The dataframe libraries currently supported are:

🛠️ Installation

# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack

# for polars
pip install "dataframes-haystack[polars]"

💻 Usage

Tip

See the Example Notebooks for complete examples.

Pandas

import pandas as pd

from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter

df = pd.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}

Polars

import polars as pl

from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter

df = pl.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)

Result:

>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}

🤝 Contributing

Do you have an idea for a new feature? Did you find a bug that needs fixing?

Feel free to open an issue or submit a PR!

Setup development environment

Requirements: hatch, pre-commit

  1. Clone the repository
  2. Run hatch shell to create and activate a virtual environment
  3. Run pre-commit install to install the pre-commit hooks. This will force the linting and formatting checks.

Run tests

  • Linting and formatting checks: hatch run lint:fmt
  • Unit tests: hatch run test-cov-all

✍️ License

dataframes-haystack is distributed under the terms of the MIT license.