dataframes-haystack
is an extension for Haystack 2 that enables integration with dataframe libraries.
The library offers custom Converters components that convert data stored in dataframes into Haystack Document
objects.
The dataframe libraries currently supported are:
# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack
# for polars
pip install "dataframes-haystack[polars]"
Tip
See the Example Notebooks for complete examples.
import pandas as pd
from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter
df = pd.DataFrame({
"text": ["Hello world", "Hello everyone"],
"filename": ["doc1.txt", "doc2.txt"],
})
converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
Result:
>>> documents
{'documents': [
Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
import polars as pl
from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter
df = pl.DataFrame({
"text": ["Hello world", "Hello everyone"],
"filename": ["doc1.txt", "doc2.txt"],
})
converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
Result:
>>> documents
{'documents': [
Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
Do you have an idea for a new feature? Did you find a bug that needs fixing?
Feel free to open an issue or submit a PR!
Requirements: hatch
, pre-commit
- Clone the repository
- Run
hatch shell
to create and activate a virtual environment - Run
pre-commit install
to install the pre-commit hooks. This will force the linting and formatting checks.
- Linting and formatting checks:
hatch run lint:fmt
- Unit tests:
hatch run test-cov-all
dataframes-haystack
is distributed under the terms of the MIT license.