pysocrata

Python module for interfacing with Socrata open data platform metadata.


Keywords
data
License
MIT
Install
pip install pysocrata==0.2.1

Documentation

pysocrata PyPi version t

About

pysocrata is a small Python module which provides an interface to the Socrata open data portal catalog API.

It is not a binding for the Socrata SODA API, which implements direct access to data within individual dataset. For that, see sodapy or rsocrata. pysocrata instead provides a way of getting a list of all datasets on a portal, for example, or of querying them for various dataset-level characteristics.

Quickstart

In the following brief overview I take domain to mean the open data portal of interest (for example, the New York City Open Data Portal, or the New York State Open Data Portal). An endpoint is a URL-accessible "thing" on a Socrata open data portal. A resource (or data endpoint) is an endpoint which contains actual novel data; this definition purposefully excludes endpoints which publish data filtered or remixed from another source. For instance, in the case of a map-type endpoint that displays a sub-selection of data from a different geospatial type resource, the originator is a resource, while the map endpoint is merely a non-data endpoint.

The methodology implemented in this module solves the surprisingly non-trivial problem of distinguishing between data and non-data endpoints on a Socrata open data portal, which are not immediately differenciable in the Socrata catalog API.

Keeping these definitions in mind, this module has two user-facing top-level methods:

  • pysocrata.get_resources(domain, token) returns the metadata for every resource on a domain of interest.
  • pysocrata.count_resources(domain, token) returns a counter of resources by type.

You may also peruse pysocrata.get_endpoints_using_raw_json_emission and pysocrata.get_endpoints_using_catalog_api to fetch the raw catalog API streams.

Further Reading