yayes

Repair erroneous entries in data streams while maintaining accuracy of overall trends.


Keywords
yayes, smoothing, data, error, fill, patching, trend, Greg, Yannett, Josh, Hayes
License
MIT
Install
pip install yayes==1.0.6

Documentation

Yayes Data Patching

Overview:

Yayes patching was designed to repair erroneous entries in data streams while maintaining accuracy of overall trends. The yayes package is most powerful in situations where users see sudden, temporary drops in variables' values that then return to correct variable values (i.e. discrepancies in variable values that may be caused by missing data or data input errors). This method was inspired by the logic behind max-pooling insofar as high values of data often contain more meaningful information and that lower values of data may more often be discarded without losing relevant information.

Yayes is not intended to resolve all data errors and should only be applied intentionally to those specific contexts where it defensibly improves subsequent analyses.

Example Code

import yayes 

df[yayes_col] = yayes(col_to_yayes)

Data Patching Logic

The Yayes patcher takes in an array and calculates a difference array. The patcher then traverses the array and only activates once a negative difference occurs. If a second negative difference occurs, the walker deactivates. Otherwise, the walker continues navigating the array for as long as the values do not differ significantly. Once a significant increase is observed, the walker determines the magnitude of the increase and either marks the gap to be patched or deactivates. Patching is done by calling a separate function that applied a linear transformation between two points of the array. After patching, the walker restarts traversing the array after the point of initial activation.

Examples of Data Patching

  • Examples of Yayes Data Patching

Initial History Completed Yayes

Initial History Completed Yayes

Initial History Completed Yayes

Initial History Completed Yayes

Arguments and Adjustments

  • Threshold Ceiling to Continue Walk
  • Increases-Window to Deactive Walk
  • Return Range to Activate Patching
  • Growth Floor to Activate Patching
  • Final Entry Lifting On/Off
  • Final Entry Lift Threshold

Contact

Correspondences can be direct to gsyann@berkeley.edu and hayesjosh@alumni.stanford.edu.