imbutil
Additions to the imbalanced-learn
package.
from imbutil.combine import MinMaxRandomSampler; from imblearn import pipeline;
# oversampling minority classes to 100 and undersampling majority classes to 800
sampler = MinMaxRandomSampler(min_freq=100, max_freq=800)
sampling_clf = pipeline.make_pipeline(sampler, inner_clf)
Contents
1 Installation
pip install imbutil
Additionally, the MinMaxRandomSampler
, in addition to RandomUnderSampler
and RandomOverSampler
from imbalanced-learn
, can technically be used with non-numeric data. However, the current implementation of imbalanced-learn
forces a check for numeric data for all samplers. If you want to bypass this limitation, I have a fork of the project which does not force data to be numeric. You can install it with:
pip install git+https://github.com/shaypal5/imbalanced-learn.git@f6adc562fafdc2198931873799e725e5abdd65a1
2 Basic Use
imbutil
additions addhere to the structure of the imblearn
package:
2.1 combine
Containes samplers that both under-sample and over-sample:
MinMaxRandomSampler
- Random samples data to bring all class frequencies into a range.
3 Contributing
Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed.
3.1 Installing for development
Clone:
git clone git@github.com:shaypal5/imbutil.git
Install in development mode, and with test dependencies:
cd imbutil
pip install -e ".[test]"
3.2 Running the tests
To run the tests use:
cd imbutil
pytest
3.3 Adding documentation
The project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings. When documenting code you add to this project, follow these conventions.
Additionally, if you update this README.rst
file, use python setup.py checkdocs
to validate it compiles.
4 Credits
Created by Shay Palachy (shay.palachy@gmail.com).