forecastcards

Data specification for travel forecasting cards in order to assess performance of travel forecasts


License
Apache-2.0
Install
pip install forecastcards==0.1.dev1

Documentation

What are forecast cards?

Forecast cards are a simple data specification for storing key information about your travel forecast in order to:

  • evaluate performance of a forecast over time,
  • analyze the collective performance of forecasting systems and institutions over time, and
  • identify contributing factors to high performing forecasts.

Overview of forecast cards

There four are types of Forecast Cards:

  • Points of Interest, such as a roadway segment or transit line,
  • Projects, such as a roadway expansion, an HOV designation,
  • Scenarios or runs, including information about the forecasting system
  • Forecasts, which are predictions at the points of interest about what the project will do,
  • Observations, which are points of data used to evaluate the the forecasts

Each "card" is a text-based CSV file.

The Schema

entity relationship diagram

Overview of data relationships

Forecast Cards are compatible with the Open Knowledge Foundation's Frictionless Data Table Schema specification.

Explore the data schema from your web browser using colaboratory:

Open In Colab

Included Examples

This project currently includes one example, which is the Emerald City DOT's HOV expansion for the Yellow Brick Road, which is contained in forecastcards/examples/emeraldcitydot-rx123-yellowbrickroadhov

This example can be analyzed and run with the notebooks folder of this directory and can be run using binder or colaboratory.

Open In Colab

Table Validity Status: goodtables.io

Suggested card naming and organization

In order to leverage a common set of tools, we suggest that forecast card data is stored in the following naming and folder structure:

agency-name-project-id-project-short-name/
   |---README.md
   |---
   |---project-<project-id>-<project-short-name>.csv
   |---scenarios-<project-id>.csv
   |---poi-<project-id>.csv
   |---observations/
   |   |---observations-<date>.csv
   |
   |---forecasts/
   |   |---forecast-<scenario-id>-<scenario-year>-<forecast-creation>-<forecast-id>.csv

How do I start on my own?

  1. Make sure you have the required data by examining the schema.

  2. Format Data as Forecast Cards helper scripts on the way

  3. Use template notebooks locally or on a hosted remote server (i.e. colaboratory) to clean data and estimate quantile regressions.

Making forecast cards publicly available

There are three likely options for making your data available:

  1. Github (not great for extremely large datasets)
  2. Amazon S3 / Microsoft Azure / Google Cloud (functionality coming soon)
  3. Other agency-hosted web services (i.e. Socrata, webserver, etc.)

Submitting forecast cards to community data store

You can submit forecast cards to the community data store by:

  1. submitting a pull-request to the forecastcardsdata repository
  2. submitting an issue with a link to the location of the data along with permission to host it on the repository.
  3. set up the public data store as a mirror.

Getting Help

Please submit an issue!

Suggested Workflow

Initial setup

  • decide where the permanent "cold storage" of your data will live: local file server, cloud?
  • catalog and convert historic data

Starting a new project

From Template [recommended]

  1. Copy the folder from \template folder in the forecastcards package to your folder for holding all the project forecastcards.
  2. Rename project folder according to schema, taking care to not duplicate any - roject IDs within your analysis scope (usually your agency of the forecastcarddata store).
  3. Add observations, POIs, forecast runs, and forecasts for specific POIs as they are determined or created.
  4. Confirm data in new project conforms to data schema by running python check_cards.py -pd <project_directory> or for all the projects in a directory by running python check_cards.py from that directory.

Adding a forecast to an existing project

  1. add a new forecast csv file with relevant data for points of interest
  2. add an entry to scenario csv file about the model run
  3. Add any additional points of interest to poi csv file
  4. Confirm new data in project conforms to data schema by running python check_cards.py -pd <project_directory> or for all the projects in a directory by running python check_cards.py from that directory.

Adding observed data to existing project

  1. Add a new observations csv
  2. Confirm new data in project conforms to data schema by running python check_cards.py -pd <project_directory> or for all the projects in a directory by running python check_cards.py from that directory.

Run analysis

As summarized in the Estimate_Quantiles.ipynb notebook

  1. Select cards to use
  2. Clean and merge cards
  3. Create any additional categorical variables
  4. Perform regressions