# korr Release 0.10.0

collection of utility functions for correlation analysis

Keywords
binary-correlation, confusion-matrix, correlation, correlation-analysis, correlation-matrix, correlation-pairs, eda, kendall, kendall-tau, matthews, p-value, pearson, pearson-correlation, pypi, python, rank-correlation, sample-correlation, spearman
Apache-2.0
Install
pip install korr==0.10.0

# korr

collection of utility functions for correlation analysis

## Usage

Check the examples folder for notebooks.

Compute correlation matrix and its p-values

• pearson -- Pearson/Sample correlation (interval- and ratio-scale data)
• kendall -- Kendall's tau rank correlation (ordinal data)
• spearman -- Spearman rho rank correlation (ordinal data)
• mcc -- Matthews correlation coefficient between binary variables

EDA, Dig deeper into results

• flatten -- A table (pandas) with one row for each correlation pairs with the variable indicies, corr., p-value. For example, try to find "good" cutoffs with corr_vs_pval and then look up the variable indicies with flatten afterwards.
• slice_yx -- slice a correlation and p-value matrix of a (y,X) dataset into a (y,x_i) vector and (x_j, x_k) matrices
• corr_vs_pval -- Histogram to find p-value cutoffs (alpha) for a) highly correlated pairs, b) unrelated pairs, c) the mixed results.
• bracket_pval -- Histogram with more fine-grained p-value brackets.
• corrgram -- Correlogram, heatmap of correlations with p-values in brackets

Utility functions

• confusion -- Confusion matrix. Required for Matthews correlation (mcc) and is a bitter faster than sklearn's

Parameter Stability

• bootcorr -- Estimate multiple correlation matrices based on bootstrapped samples. From there you can assess how stable correlation estimates are (how sensitive against in-sample variation). For example, stable estimates are good candidates for modeling, and unstable correlation pairs are good candidates for P-hacking and non-reproducibility.

Variable Selection, Search Functions

• mincorr -- From all estimated correlation pairs, pick a given n=3,5,.. of variables with low and insignificant correlations among each other. (See binsel package for an application.)
• find_best -- Find the N "best", i.e. high and most significant, correlations
• find_worst -- Find the N "worst", i.e. insignificant/random and low, correlations
• find_unrelated -- Return variable indicies of unrelated pairs (in terms of insignificant p-value)

## Appendix

### Installation

The korr git repo is available as PyPi package

pip install korr

### Install a virtual environment

python3.7 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir

(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv. Use an absolute path without whitespaces.)

### Commands

• Check syntax: flake8 --ignore=F401
• Run Unit Tests: pytest
• Remove .pyc files: find . -type f -name "*.pyc" | xargs rm
• Remove __pycache__ folders: find . -type d -name "__pycache__" | xargs rm -rf

Publish