korr
collection of utility functions for correlation analysis
Usage
Check the examples folder for notebooks.
Compute correlation matrix and its pvalues
 pearson  Pearson/Sample correlation (interval and ratioscale data)
 kendall  Kendall's tau rank correlation (ordinal data)
 spearman  Spearman rho rank correlation (ordinal data)
 mcc  Matthews correlation coefficient between binary variables
EDA, Dig deeper into results

flatten  A table (pandas) with one row for each correlation pairs with the variable indicies, corr., pvalue. For example, try to find "good" cutoffs with
corr_vs_pval
and then look up the variable indicies withflatten
afterwards.  slice_yx  slice a correlation and pvalue matrix of a (y,X) dataset into a (y,x_i) vector and (x_j, x_k) matrices
 corr_vs_pval  Histogram to find pvalue cutoffs (alpha) for a) highly correlated pairs, b) unrelated pairs, c) the mixed results.
 bracket_pval  Histogram with more finegrained pvalue brackets.
 corrgram  Correlogram, heatmap of correlations with pvalues in brackets
Utility functions
 confusion  Confusion matrix. Required for Matthews correlation (mcc) and is a bitter faster than sklearn's
Parameter Stability
 bootcorr  Estimate multiple correlation matrices based on bootstrapped samples. From there you can assess how stable correlation estimates are (how sensitive against insample variation). For example, stable estimates are good candidates for modeling, and unstable correlation pairs are good candidates for Phacking and nonreproducibility.
Variable Selection, Search Functions

mincorr  From all estimated correlation pairs, pick a given
n=3,5,..
of variables with low and insignificant correlations among each other. (See binsel package for an application.) 
find_best
 Find the N "best", i.e. high and most significant, correlations 
find_worst
 Find the N "worst", i.e. insignificant/random and low, correlations  find_unrelated  Return variable indicies of unrelated pairs (in terms of insignificant pvalue)
Appendix
Installation
The korr
git repo is available as PyPi package
pip install korr
Install a virtual environment
python3.7 m venv .venv
source .venv/bin/activate
pip install upgrade pip
pip install r requirements.txt nocachedir
pip install r requirementsdev.txt nocachedir
pip install r requirementsdemo.txt nocachedir
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
Commands
 Check syntax:
flake8 ignore=F401
 Run Unit Tests:
pytest
 Remove
.pyc
files:find . type f name "*.pyc"  xargs rm
 Remove
__pycache__
folders:find . type d name "__pycache__"  xargs rm rf
Publish
pandoc README.md from markdown to rst s o README.rst
python setup.py sdist
twine upload r pypi dist/*
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.