grizzlys

Python DataFrames powered by Julia


Keywords
python, julia, pandas, polars, koalas, data-science, data-engineering, data-analysis, dataframe, dataframes, big-data, data, data-frame, data-frames, dataframe-library, dataframes-jl
License
Other
Install
pip install grizzlys==0.0.1.dev1

Documentation

grizzlys


Code style: Ruff Linting: Ruff pre-commit

grizzlys: User-friendly Python DataFrames powered by Julia

grizzlys is a Python package that provides a native interface on top of Julia's popular DataFrames.jl package.

As a user-friendly alternative to existing Python packages such as pandas and polars, it is designed to be a convenient & easy to use DataFrames tool for data analysts, data engineers and data scientists alike, while still providing high performance and abstractions, thanks to Julia's high-performance computing capabilities.

Why you might consider using grizzlys

✅ You are transitioning into Python from a Julia or R programming background

✅ You are accustomed to working with Jupyter notebooks (or a REPL) and performing exploratory data analysis (EDA) on-the-fly

✅ You need a quick-and-dirty data wrangling tool that provides readymade macros and convenience functions out of the box

✅ You work with statistics or linear algebra often and require a wide range of statistical/algebraic functions to be well-integrated with your DataFrames

What is grizzlys (currently) NOT well-suited for

Larger-than-memory datasets - grizzlys' current implementation relies on data being stored in-memory, and therefore it is not a good choice if you work with datasets that don't fit in your machine's RAM.

For such cases, using Polars or Dask DataFrames would be a much better choice as of now.

Lazy Evaluation - Similar to the above, grizzlys is currently designed to be fully eager, which means it always immediately executes your code, as opposed to building a task/computation graph or thereabout and delaying execution until it's needed.

Backwards compatibility - grizzlys is based on a relatively new programming language in Julia, and is developed using an advanced version of Python, with little regard to end-of-life versions or any compatibility with Python 2.7, for example.

You should therefore not rely on grizzlys for integrations with very old code or any other legacy/deprecated tools and implementations.

Best-in-class Performance - Though Julia is widely considered a very high-performance language (it is actually a major reason why it's used under the hood here), grizzlys is still a work-in-progress (WIP) and therefore does not currently aim to compete with, or outperform, other high-performance DataFrame libraries, such as Polars (written in Rust) or Modin (Multi-threaded pandas).

This, of course, might no longer be a limitation in the future, as grizzlys will have undergone optimizations and maturation.


Go to Top