Distances and divergences between distributions implemented in python.


License
MIT
Install
pip install dictances==1.5.6

Documentation

Dictances

Pypi project Pypi total project downloads

Distances and divergences between discrete distributions described as dictionaries implemented in python.

These are meant as fast solutions to compute distances and divergences between discrete distributions, expecially when the two distributions contains a significant amount of events with nill probability which are not described in the dictionaries.

How do I install this package?

As usual, just download it using pip:

pip install dictances

Available metrics

A number of distances and divergences are available:

Distances Methods
Bhattacharyya distance bhattacharyya
Bhattacharyya coefficient bhattacharyya_coefficient
Canberra distance canberra
Chebyshev distance chebyshev
Chi Square distance chi_square
Cosine Distance cosine
Euclidean distance euclidean
Hamming distance hamming
Jensen-Shannon divergence jensen_shannon
Kullback-Leibler divergence kullback_leibler
Mean absolute error mae
Taxicab geometry manhattan, cityblock, total_variation
Minkowski distance minkowsky
Mean squared error mse
Pearson's distance pearson
Squared deviations from the mean squared_variation

Usage example with points

Suppose you have a point described by my_first_dictionary and another one described by my_second_dictionary:

from dictances import cosine

my_first_dictionary = {
    "a": 56,
    "b": 34,
    "c": 89
}

my_second_dictionary = {
    "a": 21,
    "d": 51,
    "e": 74
}

cosine(my_first_dictionary, my_second_dictionary)
#>>> 0.8847005261889619

Usage example with distributions

Suppose you have a point described by my_first_dictionary and another one described by my_second_dictionary:

from dictances import bhattacharyya, bhattacharyya_coefficient

a = {
    "event_1": 0.4,
    "event_2": 0.1,
    "event_3": 0.2,
    "event_4": 0.3,
}
b = {
    "event_1": 0.1,
    "event_2": 0.2,
    "event_5": 0.2,
    "event_9": 0.5,
}

bhattacharyya_coefficient(a, b)
#>>> 0.3414213562373095
bhattacharyya(a, b)
#>>> 1.07463791569453

Handling nested dictionaries

If you need to compute the distance between two nested dictionaries you can use deflate_dict as follows:

from dictances import cosine
from deflate_dict import deflate

my_first_dictionary = {
    "a": 8,
    "b": {
        "c": 3,
        "d": 6
    }
}

my_second_dictionary = {
    "b": {
        "c": 8,
        "d": 1
    },
    "y": 3,

}

cosine(deflate(my_first_dictionary), deflate(my_second_dictionary))