pgeon

Apache Arrow PostgreSQL connector


Keywords
arrow, postgresql, postgres
License
MIT
Install
pip install pgeon==0.2.0a0

Documentation

Pgeon 🐦

Build License: MIT

Apache Arrow PostgreSQL connector

pgeon provides a C++ library and (very) simple python bindings. Almost all PostgreSQL native types are supported (see below).

This project is similar to pg2arrow and is heavily inspired by it. The main differences are the use of COPY instead of FETCH and that our implementation uses the Arrow C++ API.

The goal of pgeon is to provide fast bulk data download from a PostgreSQL database into Apache Arrow tables. If you're looking to upload data, you might want to have a look at Arrow ADBC.

Usage

from pgeon import copy_query
db = "postgresql://postgres@localhost:5432/postgres"
query = "SELECT TIMESTAMP '2001-01-01 14:00:00'"
tbl = copy_query(db, query)

The actual query performed is COPY ({query}) TO STDOUT (FORMAT binary), see this page for more information.

Installation

Building and running pgeon requires libpq to be available on your system.

Python

Install from source using pip with

git clone https://github.com/0x0L/pgeon.git
cd pgeon
pip install .

On linux, if pyarrow is already installed as a conda package, you may want to use

CONDA_BUILD=1 pip install .

[optional] C++ library and tools

This requires cmake and ninja. In addition you'll need to install libpq and the Arrow C++ libraries (e.g. arrow-cpp in conda)

mkdir build
cd build
cmake -GNinja ..
ninja

Performance

Elapsed time distributions of a query fetching 7 columns (1 timestamp, 2 ints, 4 reals) and around 4.5 million rows. The result is returned as a pandas.DataFrame in all cases.

Notes

  • Queries using ROW (e.g. SELECT ROW('a', 1)) do not work (anonymous structs)

  • SQL arrays are mapped to pyarrow.list_(...). Only 1D arrays are fully supported. Higher dimensional arrays will be flattened.

  • BitString types output format is not really helpful

  • tsvector types with letter weights are not supported

  • PostgreSQL range and domain types are not supported.