cluster-over-sampling

A general interface for clustering based over-sampling algorithms.


Keywords
machine, learning, imbalanced, oversampling, data-science, imbalanced-data, imbalanced-learning, machine-learning, python3, scikit-learn
License
MIT
Install
pip install cluster-over-sampling==0.6.0

Documentation

cluster-over-sampling

ci doc

Category Tools
Development black ruff mypy docformatter
Package version pythonversion downloads
Documentation mkdocs
Communication gitter discussions

Introduction

A general interface for clustering based over-sampling algorithms.

Installation

For user installation, cluster-over-sampling is currently available on the PyPi's repository, and you can install it via pip:

pip install cluster-over-sampling

Development installation requires to clone the repository and then use PDM to install the project as well as the main and development dependencies:

git clone https://github.com/georgedouzas/cluster-over-sampling.git
cd cluster-over-sampling
pdm install

SOM clusterer requires optional dependencies:

pip install cluster-over-sampling[som]

Usage

All the classes included in cluster-over-sampling follow the imbalanced-learn API using the functionality of the base oversampler. Using scikit-learn convention, the data are represented as follows:

  • Input data X: 2D array-like or sparse matrices.
  • Targets y: 1D array-like.

The clustering-based oversamplers implement a fit method to learn from X and y:

clustering_based_oversampler.fit(X, y)

They also implement a fit_resample method to resample X and y:

X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)

References

If you use cluster-over-sampling in a scientific publication, we would appreciate citations to any of the following papers: