elbow

Lift special-purpose data into common tabular formats for analytics 💪


Keywords
arrow, data-pipeline, elt, etl, parquet
License
MIT
Install
pip install elbow==0.1.2a0

Documentation

💪 Elbow

Build codecov Code style: black MIT License

Elbow is a lightweight and scalable library for getting diverse data out of specialized formats and into common tabular data formats for downstream analytics.

Example

Extract image metadata and pixel values from all JPEG image files under the current directory and save as a Parquet dataset.

import numpy as np
import pandas as pd
from PIL import Image

from elbow.builders import build_parquet

def extract_image(path: str):
    img = Image.open(path)
    width, height = img.size
    pixel_values = np.asarray(img)
    return {
        "path": path,
        "width": width,
        "height": height,
        "pixel_values": pixel_values,
    }

build_parquet(
    source="**/*.jpg",
    extract=extract_image,
    output="images.pqds/",
    workers=8,
)

df = pd.read_parquet("images.pqds")

For a complete example, see here.

Installation

pip install elbow

The current development version can be installed with

pip install git+https://github.com/childmindresearch/elbow.git

Related projects

There are many other high quality projects for extracting, loading, and transforming data. Some alternative projects focused on somewhat different use cases are:

Contributing

We welcome contributions of any kind! If you'd like to contribute, please feel free to start a conversation in our issues.