PyORC
Python module for reading and writing Apache ORC file format. It uses the Apache ORC's Core C++ API under the hood, and provides a similar interface as the csv module in the Python standard library.
Supports only Python 3.8 or newer and ORC 1.7.
Features
- Reading ORC files.
- Writing ORC files.
- While using Python's stream/file-like object IO interface.
That sums up quite well the purpose of this project.
Example
Minimal example for reading an ORC file:
import pyorc
with open("./data.orc", "rb") as data:
reader = pyorc.Reader(data)
for row in reader:
print(row)
And another for writing one:
import pyorc
with open("./new_data.orc", "wb") as data:
with pyorc.Writer(data, "struct<col0:int,col1:string>") as writer:
writer.write((1, "ORC from Python"))
Contribution
Any contributions are welcome. If you would like to help in development fork or report issue here on Github. You can also help in improving the documentation.