pandas-transform-checker

function annotations to check properties on pandas dataframe transformations


License
BSD-1-Clause
Install
pip install pandas-transform-checker==0.1.1

Documentation

Pandas transform checker

what is it ?

This library is focused on data quality checking on pandas transformations. Transformations are functions that takes a pandas DataFrame as input ( plus other params ) and output a DataFrame.

This library allow the user to specify a contract that the function must respect. In this contract the user can specify:

  • the added columns
  • the deleted columns
  • the modified columns
  • if the function add/drop records
  • if the function modify the index ( ex: resampling )

Once the contract if specified, the function will raise a RuntimeError if one of it's specifications is violated.

how to use it ?

The package contains the decorator that performs the check it can be imported the following way:

from pandas_transform_checker.decorator_contract_checker import input_df_contract

Args

df_param: name of the param of the function that is the input df contract_params: dict defining the contract of the function in the following format:

contract_dict = {
    "col_additions": {
        "col_a": "int",
        "col_b": "float"
    },
    "col_deletions": {
        "col_c",
        "col_d"
    },
    "col_editions": {
        "col_e",
        "col_f"
    },
    "allow_index_edition": False,
    "allow_drop_record": True
}

which means that the function must create "col_a", "col_b", delete "col_c", "col_d", must not modify any column data except "col_e", "col_f", and must not edit the index

here is the list of keys allowed in this dict:

  • col_additions: dict where keys are column names and values are dtypes (string)
  • col_deletions: set of str representing the deleted columns
  • col_editions: set of str representing the modified columns
  • allow_index_edition: bool indicating if the function modify the index
  • allow_add_drop_record (bool): indicate if the function can drop some records (ex. when dropna is used)

Usage

when you have a function that takes a df as input:

def super_func(df_input):
    ...

just add the annotation to automatically check properties

@input_df_contract(df_param="df_input", contract_dict={"col_editions": {"col_e","col_f"}})
def super_func(df_input):
    ...