DEPRECATED - geofeather
The core functionality in
geofeather has been integrated directly into GeoPandas version 0.8.0.
See docs for instructions about how to use
read_feather in GeoPandas.
You are encouraged to use GeoPandas for this functionality. According to early benchmarks, it is even faster!
NOTE: you are not able to read
geofeather-created files directly into GeoPandas via
read_feather; you will first need to use
geofeather to read into a GeoDataFrame, then you can write that to a new feather file.
I may release an updated version to help migrate from
geofeather to the new functionality in GeoPandas. GeoPandas uses a metadata schema stored within the feather file to hold the CRS information and other details, which makes the new representation more compact (no more sidecar files for CRS info).
If you need help converting files created using
geofeather for use in GeoPandas, please create an issue.
A faster file-based format for geometries with
This project capitalizes on the very fast
feather file format to store geometry (points, lines, polygons) data for interoperability with
Why does this exist?
This project exists because reading and writing standard spatial formats (e.g., shapefile) in
geopandas is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.
In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via
.to_file() on a
We see about 2x faster reads compared to geopandas
How does it work?
feather format works brilliantly for standard
pandas data frames. In order to leverage the
feather format, we simply convert the geometry data from
shapely objects into Well Known Binary (WKB) format, and then store that column as raw bytes.
We store the coordinate reference system using JSON format in a sidecar file
Available on PyPi at: https://pypi.org/project/geofeather/
pip install geofeather
Given an existing
my_gdf, pass this into
my_gdf = from_geofeather('test.feather')
pygeos provides much faster operations of geospatial operations over arrays of geospatial data.
geopandas is in the process of migrating to using
pygeos geometries as its internal data storage instead of
pygeos is fully integrated, there are shims in
geofeather to support interoperability with pandas DataFrames containing
pygeos geometries. If you are already using
pygeos against data you read from
geofeather, using the following shims will generate 3-7x speedups reading and writing data compared to
geofeather reading into GeoDataFrames.
Internally, the feather file is identical to the one created above.
pygeos is required in order to use this functionality.
WARNING: this will be deprecated as soon as
pygeos is integrated into
from geofeather.pygeos import to_geofeather, from_geofeather # given a DataFrame df containing pygeos geometries in 'geometry' column # and a crs object to_geofeather(df, 'test.feather', crs=crs) df = from_geofeather('test.geofeather')
Note: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above
Right now, indexes are not supported in
feather files. In order to get around this, simply reset your index before calling
crsattribute to pandas DataFrame containing
- allow serializing to / from pandas DataFrames containing
pygeosgeometries (see notes above).
- use new CRS object in geopandas data frames (#4)
to_shp; use geopandas
- allow reading a subset of columns from a feather file
- store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)
- Initial release
Everything that makes this fast is due to the hard work of contributors to