data-complexity

Data Complexity Measures


License
MIT
Install
pip install data-complexity==0.1.3

Documentation

data-complexity

The Data Complexity Measures in Python

Install

$ pip install data-complexity

How it works

Maximum Fisher's Discriminant Ratio (F1)

from dcm import dcm
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

index, F1 = dcm.F1(X, y)

Fraction of Borderline Points (N1)

from dcm import dcm
from sklearn import datasets

bc = datasets.load_breast_cancer(as_frame=True)
X = bc.data.values
y = bc.target.values

N1 = dcm.N1(X, y)

Entropy of Class Proportions (C1) and Imbalance Ratio (C2)

from dcm import dcm
from sklearn import datasets

bc = datasets.load_breast_cancer(as_frame=True)
X = bc.data.values
y = bc.target.values

C1, C2 = dcm.C12(X, y)

Other Measures

Coming soon...

References

[1] How Complex is your classification problem? A survey on measuring classification complexity, https://arxiv.org/abs/1808.03591

[2] The Extended Complexity Library (ECoL), https://github.com/lpfgarcia/ECoL