# hdmedians Release 0.14.1

High-dimensional medians

Keywords
high-dimensional-data, machine-learning, median, python, statistics
Apache-2.0/libpng-2.0
Install
``` pip install hdmedians==0.14.1 ```

# Hdmedians

Did you know there is no unique way to mathematically extend the concept of a median to higher dimensions?

Various definitions for a high-dimensional median exist and this Python package provides a number of fast implementations of these definitions. Medians are extremely useful due to their high breakdown point (up to 50% contamination) and have a number of nice applications in machine learning, computer vision, and high-dimensional statistics.

This package currently has implementations of medoid and geometric median with support for missing data using `NaN`.

### Installation

The latest version of the package is always available on pypi, so can be easily installed by typing:

``````pip3 install hdmedians
``````

## Medoid

Given a finite set of -dimensional observation vectors , the medoid of these observations is given by

The current implementation of `medoid` is in vectorized Python and can handle any data type supported by ndarray. If you would like the algorithm to take care of missing values encoded as `nan` then you can use the `nanmedoid` function.

### Examples

Create an 6 x 10 array of random integer observations.

``````>>> import numpy as np
>>> X = np.random.randint(100, size=(6, 10))
array([[12,  9, 61, 76,  2, 17, 12, 11, 26,  0],
[65, 72,  7, 64, 21, 92, 51, 48,  9, 65],
[39,  7, 50, 56, 29, 79, 47, 45, 10, 52],
[70, 12, 23, 97, 86, 14, 42, 90, 15, 16],
[13,  7,  2, 47, 80, 53, 23, 59,  7, 15],
[83,  2, 40, 12, 22, 75, 69, 61, 28, 53]])
``````

Find the medoid, taking the last axis as the number of observations.

``````>>> import hdmedians as hd
>>> hd.medoid(X)
array([12, 51, 47, 42, 23, 69])
``````

Take the first axis as the number of observations.

``````>>> hd.medoid(X, axis=0)
array([39,  7, 50, 56, 29, 79, 47, 45, 10, 52])
``````

Since the medoid is one of the observations, the `medoid` function has the ability to only return the index if required.

``````>>> hd.medoid(X, indexonly=True)
6
>>> X[:,6]
array([12, 51, 47, 42, 23, 69])
``````

## Geometric Median

The geometric median is also known as the 1-median, spatial median, Euclidean minisum, or Torricelli point. Given a finite set of -dimensional observation vectors , the geometric median of these observations is given by

Note there is a subtle difference between the definition of the geometric median and the medoid: the search space for the solution differs and has the effect that the medoid returns one of the true observations whereas the geometric median can be described as a synthetic (not physically observed) observation.

The current implementation of `geomedian` uses Cython and can handle `float64` or `float32`. If you would like the algorithm to take care of missing values encoded as `nan` then you can use the `nangeomedian` function.

### Examples

Create an 6 x 10 array of random `float64` observations.

``````>>> import numpy as np
>>> np.set_printoptions(precision=4, linewidth=200)
>>> X = np.random.normal(1, size=(6, 10))
array([[ 1.1079,  0.5763,  0.3072,  1.2205,  0.8596, -1.5082,  2.5955,  2.8251,  1.5908,  0.4575],
[ 1.555 ,  1.7903,  1.213 ,  1.1285,  0.0461, -0.4929, -0.1158,  0.5879,  1.5807,  0.5828],
[ 2.1583,  3.4429,  0.4166,  1.0192,  0.8308, -0.1468,  2.6329,  2.2239,  0.2168,  0.8783],
[ 0.7382,  1.9453,  0.567 ,  0.6797,  1.1654, -0.1556,  0.9934,  0.1857,  1.369 ,  2.1855],
[ 0.1727,  0.0835,  0.5416,  1.4416,  1.6921,  1.6636,  1.6421,  1.0687,  0.6075, -0.0301],
[ 2.6654,  1.6741,  1.1568,  1.3092,  1.6944,  0.2574,  2.8604,  1.6102,  0.4301, -0.3876]])
>>> X.dtype
dtype('float64')
``````

Find the geometric median, taking the last axis as the number of observations.

``````>>> import hdmedians as hd
>>> np.array(hd.geomedian(X))
array([ 1.0733,  0.8974,  1.1935,  0.9122,  0.9975,  1.3422])
``````

Take the first axis as the number of observations.

``````>>> np.array(hd.geomedian(X, axis=0))
array([ 1.4581,  1.6377,  0.7147,  1.1257,  1.0493, -0.091 ,  1.7907,  1.4168,  0.9587,  0.6195])
``````

Convert to `float32` and compute the geometric median.

``````>>> X = X.astype(np.float32)
>>> m = hd.geomedian(X)
``````