ksu on Pypi

KSU Compression Algorithm Implementation

Algortihm 1 from Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

Installation

With pip: pip install ksu
From source:
- git clone --depth=1 https://github.com/nimroha/ksu_classifier.git
- cd ksu_classifier
- python setup.py install

Usage

Command Line

This package provides two command line tools: e-net and ksu:

e-net constructs an epsilon net for a given epsilon
ksu runs the full algorithm

Both provide the -h flag to specify the arguments, and both can save the result to the disk in numpy's .npz format

Code

This package provides a class KSU(Xs, Ys, metric, [gram, prune, logLevel, n_jobs])

Xs and Ys are the data points and their respective labels as numpy arrays

metric is either a callable to compute the metric or a string that names one of our provided metrics (print KSU.METRICS.keys() for the full list)

gram (optional, default=None) a precomputed gramian matrix, will be calculated if not provided.

prune (optional, default=False) a boolean indicating whether to prune the compressed set or not (Algorithm 2 from Near-optimal sample compression for nearest neighbors)

logLevel (optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)

n_jobs (optional, default=1) an integer defining how many cpus to use (scipy logic), pass -1 to use all. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

KSU provides a method compressData([delta, minCompress, maxCompress, greedy, stride, logLevel, numProcs])

Which selects the subset with the lowest estimated error with confidence 1 - delta. Can take arguments:

delta (optional, default=0.1) confidence for error upper bound

minCompress (optional, default=0.05) minimal compression ratio

maxCompress (optional, default=0.1) maximum compression ratio

greedy (optional, default=True) whether to use greedy or hierarichal strategy for net construction

stride (optional, default=200) how many gammas to skip between each iteration (since similar gammas will produce similar nets)

logLevel (optional, default='CRITICAL') a string indicating the logging level (set to 'INFO' or 'DEBUG' to get more information)

numProcs (optional, default=1) number of processes to use

You can then run getClassifier() which returns a 1-NN Classifer (based on sklearn's K-NN) fitted to the compressed data.

Or, run getCompressedSet() to get the compressed data as a tuple of numpy arrays (compressedXs, compressedYs).

See scripts/ for example usage

Built-in metrics

['chebyshev', 'yule', 'sokalmichener', 'canberra', 'EarthMover', 'rogerstanimoto', 'matching', 'dice', 'EditDistance', 'braycurtis', 'russellrao', 'cosine', 'cityblock', 'l1', 'manhattan', 'sqeuclidean', 'jaccard', 'seuclidean', 'sokalsneath', 'kulsinski', 'minkowski', 'mahalanobis', 'euclidean', 'l2', 'hamming', 'correlation', 'wminkowski']

ksu
Release 0.5.1

Release 0.5.1

0.5.1

0.5.0

0.4.0

0.3.0

0.2.2

0.2.1

0.2.0

0.1.2

0.1.1

0.1.0

Documentation

KSU Compression Algorithm Implementation

Installation

Usage

Command Line

Code

Built-in metrics

Stats

Releases

Contributors

ksu Release 0.5.1

Release 0.5.1 Toggle Dropdown 0.5.1 0.5.0 0.4.0 0.3.0 0.2.2 0.2.1 0.2.0 0.1.2 0.1.1 0.1.0

Documentation

KSU Compression Algorithm Implementation

Installation

Usage

Command Line

Code

Built-in metrics

Stats

Releases

Contributors

ksu
Release 0.5.1

Release 0.5.1

0.5.1

0.5.0

0.4.0

0.3.0

0.2.2

0.2.1

0.2.0

0.1.2

0.1.1

0.1.0