Ball: A Python Package for Detecting Distribution Difference and Association in Metric Spaces


Keywords
ball-correlation, ball-covariance, ball-divergence, feature-selection, independence-tests, k-sample-test, sure-independence-screening
License
GPL-3.0
Install
pip install Ball==0.2.9

Documentation

Ball Statistics

AppVeyor Build Status CRAN Status Badge PyPI version

Introdution

The fundamental problems for data mining, statistical analysis, and machine learning are:

  • whether several distributions are different?
  • whether random variables are dependent?
  • how to pick out useful variables/features from a high-dimensional data?

These issues can be tackled by Ball statistics, which enjoy following admirable advantages:

  • available for most of datasets (e.g., traditional tabular data, brain shape, functional connectome, wind direction and so on)
  • insensitive to outliers, distribution-free and model-free;
  • theoretically guaranteed and computationally efficient.

Softwares

R package

Install the Ball package from CRAN:

install.packages("Ball")

Compared with selective R packages available for datasets in metric spaces:

fastmit energy HHG Ball
Test of equal distributions ✔️ ✔️ ✔️
Test of independence ✔️ ✔️ ✔️ ✔️
Test of joint independence ✔️
Feature screening / Sure Independence Screening (SIS) ✔️
Iterative Feature screening / Iterative SIS ✔️
Datasets in metric spaces ✔️ SNT ✔️ ✔️
Robustness ✔️ ✔️ ✔️
Parallel programming ✔️ ✔️
Computational efficiency 🏃🏃🏃 🏃🏃🏃 🏃🏃 🏃🏃🚶

SNT is the abbreviation of strong negative type.

See the following documents for more details about the Ball package:

Python package

Install the Ball package from PyPI:

pip install Ball

Citation

If you use Ball or reference our vignettes in a presentation or publication, we would appreciate citations of our package.

Zhu J, Pan W, Zheng W, Wang X (2021). “Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces.” Journal of Statistical Software, 97(6), 1–31. doi: 10.18637/jss.v097.i06.

Here is the corresponding Bibtex entry

@Article{,
  title = {{Ball}: An {R} Package for Detecting Distribution Difference and Association in Metric Spaces},
  author = {Jin Zhu and Wenliang Pan and Wei Zheng and Xueqin Wang},
  journal = {Journal of Statistical Software},
  year = {2021},
  volume = {97},
  number = {6},
  pages = {1--31},
  doi = {10.18637/jss.v097.i06},
}

References

Bug report

Open an issue or send an email to Jin Zhu at zhuj37@mail2.sysu.edu.cn