NoisyDataCleaner

Python classes that identify and correct/remove noise in datasets

These models leverage on monte carlo simulation to approximate the correctness of a given label. The correction of the label builds on from the noise detection model.

Install:

pip install noisydatacleaner

Models:

NoiseRemover Identifies and then removes the noise from the dataset. Random Forest is used for smaller datasets as it yields better results. Whereas for larger datasets, k-Nearest Neighbors is much more efficient.
LabelClassificationCorrector Corrects the labels for classification datasets. Instead of only using 1 model like NoiseRemover, this model uses 5 different models:

models = {
   'Random Forest': RandomForestClassifier(n_estimators=128),
   'Extra Trees': ExtraTreesClassifier(n_estimators=128),
   'Linear Discriminant': LinearDiscriminantAnalysis(),
   'Logistic Regression': LogisticRegression(max_iter=128),
   'Neural Network': MLPClassifier(hidden_layer_sizes=(128,64,32))
}

All of which comes from the sklearn library

noisydatacleaner
Release 1.0.5

Release 1.0.5

1.0.5

1.0.4

1.0.3

1.0.2

1.0.1

1.0

Documentation

NoisyDataCleaner

Install:

Models:

Stats

Development practices

Releases

Contributors

noisydatacleaner Release 1.0.5

Release 1.0.5 Toggle Dropdown 1.0.5 1.0.4 1.0.3 1.0.2 1.0.1 1.0

Documentation

NoisyDataCleaner

Install:

Models:

Stats

Development practices

Releases

Contributors

noisydatacleaner
Release 1.0.5

Release 1.0.5

1.0.5

1.0.4

1.0.3

1.0.2

1.0.1

1.0