Additions to the
from imbutil.combine import MinMaxRandomSampler; from imblearn import pipeline; # oversampling minority classes to 100 and undersampling majority classes to 800 sampler = MinMaxRandomSampler(min_freq=100, max_freq=800) sampling_clf = pipeline.make_pipeline(sampler, inner_clf)
pip install imbutil
MinMaxRandomSampler, in addition to
imbalanced-learn, can technically be used with non-numeric data. However, the current implementation of
imbalanced-learn forces a check for numeric data for all samplers. If you want to bypass this limitation, I have a fork of the project which does not force data to be numeric. You can install it with:
pip install git+https://github.com/shaypal5/imbalanced-learn.git@f6adc562fafdc2198931873799e725e5abdd65a1
imbutil additions addhere to the structure of the
Containes samplers that both under-sample and over-sample:
MinMaxRandomSampler - Random samples data to bring all class frequencies into a range.
Package author and current maintainer is Shay Palachy (email@example.com); You are more than welcome to approach him for help. Contributions are very welcomed.
git clone firstname.lastname@example.org:shaypal5/imbutil.git
Install in development mode, and with test dependencies:
cd imbutil pip install -e ".[test]"
To run the tests use:
cd imbutil pytest
The project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings. When documenting code you add to this project, follow these conventions.
Additionally, if you update this
README.rst file, use
python setup.py checkdocs to validate it compiles.
Created by Shay Palachy (email@example.com).