pyorc

Python module for reading and writing Apache ORC file format.


Keywords
python3, orc, apache-orc
License
Apache-2.0
Install
pip install pyorc==0.9.0

Documentation

PyORC

Azure Pipelines Status Codecov code coverage Documentation Status

Python module for reading and writing Apache ORC file format. It uses the Apache ORC's Core C++ API under the hood, and provides a similar interface as the csv module in the Python standard library.

Supports only Python 3.8 or newer and ORC 1.7.

Features

  • Reading ORC files.
  • Writing ORC files.
  • While using Python's stream/file-like object IO interface.

That sums up quite well the purpose of this project.

Example

Minimal example for reading an ORC file:

import pyorc

with open("./data.orc", "rb") as data:
    reader = pyorc.Reader(data)
    for row in reader:
        print(row)

And another for writing one:

import pyorc

with open("./new_data.orc", "wb") as data:
    with pyorc.Writer(data, "struct<col0:int,col1:string>") as writer:
        writer.write((1, "ORC from Python"))

Contribution

Any contributions are welcome. If you would like to help in development fork or report issue here on Github. You can also help in improving the documentation.