datatlas

Data profiling and data quality checks


License
MIT
Install
pip install datatlas==1.0.3

Documentation

Datatlas: Data Atlas

Data profiling and data quality checks

Installation & Updates

Run the following to install:

pip install datatlas

Run one of the following to update:

pip install datatlas --upgrade

pip install datatlas -U

Usage

# Import the library
import datatlas as dat

# Print 'Hello Atlas!' to test if the package installation and import were successful
dat.say_hello('Atlas') 

# Generate an output df with detailed stats about the input df including warnings for data quality issues that might need to be addressed before model training
df_profile = dat.df_profiling(df) 

'''
df_profiling(df, nulls_threshold=50.0, zeroes_threshold=50.0, cardinality_threshold=50.0)
    Input(s):
        1) df: Dataframe to analyse
        2) nulls_threshold (int): Default is 50. Adds warning if >50% nulls detected
        3) zeroes_threshold (int): Default 50. Adds warning if >50% zeroes detected for numerical dtype columns
        4) cardinality_threshold (int): Default 50. Adds warning if >50% unique values for categorical dtype columns
    Output(s):
        1) Dataframe with statistics and data quality warnings about the input df
'''

Versions

Please see the release_notes.txt file

Contributing

[section under construction]