GeoSpark Python

apache-spark, cluster-computing, geospatial, spatial-analysis, spatial-index, spatial-join, spatial-queries, spatial-sql
pip install geospark==1.3.1


GeoSpark Logo

Stable Latest Source code
Maven Central with version prefix filter Sonatype Nexus (Snapshots) Build Status

GeoSpark@Twitter || GeoSpark Discussion Board || Join the chat at

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

GeoSpark contains several modules:

Name API Spark compatibility Introduction
Core RDD Spark 2.X/1.X SpatialRDDs and Query Operators.
SQL SQL/DataFrame SparkSQL 2.1+ SQL interfaces for GeoSpark core.
Viz RDD, SQL/DataFrame RDD - Spark 2.X/1.X, SQL - Spark 2.1+ Visualization for Spatial RDD and DataFrame.
Zeppelin Apache Zeppelin Spark 2.1+, Zeppelin 0.8.1+ GeoSpark plugin for Apache Zeppelin

GeoSpark supports several programming languages: Scala, Java, SQL, Python and R.

Please visit GeoSpark website for detailed documentations


  • A research paper about "GeoSparkSim: A Microscopic Road Network Traffic Simulator in Apache Spark" is accepted to MDM 2019, Hong Kong China. The next release of GeoSpark will come with a built-in scalable traffic simulator. Please stay tuned!
  • A 1.5-hour tutorial about "Geospatial Data Management in Apache Spark" was presented by Jia Yu and Mohamed Sarwat in ICDE 2019, Macau, China. Visit our tutorial website to learn how to craft your "GeoSpark" from scratch.
  • GeoSpark 1.2.0 is released.


GeoSpark Downloads on Maven Central

GeoSpark ecosystem has around 10K downloads per month.