This is a package for computing distances among observations of statistical variables, such as: Euclidean, Minkowski, Canberra, Pearson, Mahalanobis, Robust Mahalanobis, Gower, Generalized Gower and Related Metric Scaling (RelMS). A total of 41 statistical distances can be computed.


License
MIT
Install
pip install PyDistances==0.0.18

Documentation

PyDistances: A Statistical Distances Python Package

This is a package for computing distances among observations of statistical variables, such as: Euclidean, Minkowski, Canberra, Pearson, Mahalanobis, Robust Mahalanobis, Gower, Generalized Gower and Related Metric Scaling (RelMS). A total of 41 statistical distances can be computed.

Installation

pip install PyDistances

Example of use

import PyDistances
from PyDistances import Euclidean_Dist, Euclidean_Dist_Matrix, Minkowski_Dist, Minkowski_Dist_Matrix, Canberra_Dist, Canberra_Dist_Matrix, Pearson_Dist, Pearson_Dist_Matrix, Mahalanobis_Dist, Mahalanobis_Dist_Matrix, a_b_c_d_Matrix, Sokal_Similarity, Sokal_Dist, Sokal_Dist_Matrix, Jaccard_Similarity, Jaccard_Dist, Jaccard_Dist_Matrix, alpha, Matching_Similarity, Matching_Dist, Matching_Dist_Matrix, Gower_Similarity_Matrix, Gower_Dist_Matrix, Robust_Mahalanobis_Dist, Robust_Mahalanobis_Dist_Matrix, GeneralizedGowerDistance

Getting data

We load the data we are going to work with throughout this tutorial. This data-set is available in the following link: https://github.com/FabioScielzoOrtiz/Distances_Package/blob/master/Tests/House_Price.csv

Data = pd.read_csv('House_Price.csv')
Data = Data.loc[0:150, ['latitude', 'longitude', 'price', 'size_in_m_2', 'balcony_recode', 'private_garden_recode', 'private_gym_recode', 'quality_recode', 'no_of_bathrooms', 'no_of_bedrooms']]
Data_quant = Data.loc[:,['latitude', 'longitude', 'price', 'size_in_m_2']]
Data_binary = Data.loc[:,['balcony_recode', 'private_garden_recode', 'private_gym_recode']]
Data_multiclass = Data.loc[:,['quality_recode', 'no_of_bathrooms', 'no_of_bedrooms']]
Data.head() # p1=4, p2=3, p3=3
latitude longitude price size_in_m_2 balcony private_garden private_gym quality no_of_bathrooms no_of_bedrooms
25.1132 55.1389 2.7e+06 100.242 1 0 0 2 2 1
25.1068 55.1512 2.85e+06 146.973 1 0 0 2 2 2
25.0633 55.1377 1.15e+06 181.254 1 0 0 2 5 3
25.2273 55.3418 2.85e+06 187.664 1 0 0 1 3 2
25.1143 55.1398 1.7292e+06 47.1018 0 0 0 2 1 0

Computing Euclidean distance

We compute the Euclidean distance between observation of index 0 and itself.

Euclidean_Dist(Data_quant.iloc[0,:], Data_quant.iloc[0,:])
 0.0

We compute the Euclidean distance between observation of index 0 and the one of index 2.

Euclidean_Dist(Data_quant.iloc[0,:], Data_quant.iloc[2,:])
 1550000.002117049

We compute the Euclidean distances matrix for the data-set Data_quant.

Euclidean_Dist_Matrix(Data_quant)
array([[       0.        ,   150000.00727904,  1550000.00211705, ...,
         1500000.00009635,  2700000.01899102, 12100000.00553371],
       [  150000.00727904,        0.        ,  1700000.00034565, ...,
         1650000.00026782,  2550000.0146678 , 11950000.00426352],
       [ 1550000.00211705,  1700000.00034565,        0.        , ...,
           50000.040973  ,  4250000.00673279, 13650000.00297389],
       ...,
       [ 1500000.00009635,  1650000.00026782,    50000.040973  , ...,
               0.        ,  4200000.01094663, 13600000.00447653],
       [ 2700000.01899102,  2550000.0146678 ,  4250000.00673279, ...,
         4200000.01094663,        0.        ,  9400000.00011113],
       [12100000.00553371, 11950000.00426352, 13650000.00297389, ...,
        13600000.00447653,  9400000.00011113,        0.        ]])

Now, we are going to repeat the same procedure with other available distances in PyDistances.


Computing Minkowski distance

Minkowski_Dist(Data_quant.iloc[0,:], Data_quant.iloc[0,:], q=1)
 0.0
Minkowski_Dist(Data_quant.iloc[0,:], Data_quant.iloc[2,:], q=1)
 1550081.062526
Minkowski_Dist_Matrix(Data_quant, q=1)
array([[       0.      ,   150046.748877,  1550081.062526, ...,
         1500017.050769,  2700320.266531, 12100365.997115],
       [  150046.748877,        0.      ,  1700034.338187, ...,
         1650029.78435 ,  2550273.554024, 11950319.272776],
       [ 1550081.062526,  1700034.338187,        0.      , ...,
           50064.027555,  4250239.302851, 13650284.955165],
       ...,
       [ 1500017.050769,  1650029.78435 ,    50064.027555, ...,
               0.      ,  4200303.29563 , 13600348.947944],
       [ 2700320.266531,  2550273.554024,  4250239.302851, ...,
         4200303.29563 ,        0.      ,  9400045.764238],
       [12100365.997115, 11950319.272776, 13650284.955165, ...,
        13600348.947944,  9400045.764238,        0.      ]])

Computing Canberra distance

Canberra_Dist(Data_quant.iloc[0,:], Data_quant.iloc[0,:])
  0.0
Canberra_Dist(Data_quant.iloc[0,:], Data_quant.iloc[2,:])
 0.6913917083019879
Canberra_Dist_Matrix(Data_quant)
array([[0.        , 0.21629237, 0.69139171, ..., 0.463675  , 0.9485963 ,
        1.33838751],
       [0.21629237, 0.        , 0.53043317, ..., 0.52079671, 0.79157752,
        1.19854721],
       [0.69139171, 0.53043317, 0.        , ..., 0.23597883, 1.04765637,
        1.29619958],
       ...,
       [0.463675  , 0.52079671, 0.23597883, ..., 0.        , 1.20126891,
        1.44813664],
       [0.9485963 , 0.79157752, 1.04765637, ..., 1.20126891, 0.        ,
        0.51782969],
       [1.33838751, 1.19854721, 1.29619958, ..., 1.44813664, 0.51782969,
        0.        ]])

Computing Pearson distance

Pearson_Dist(Data_quant.iloc[0,:], Data_quant.iloc[0,:], variance=Data.var())
 0.0
Pearson_Dist(Data_quant.iloc[0,:], Data_quant.iloc[2,:], variance=Data.var())
 1.5393297661160206
Pearson_Dist_Matrix(Data_quant)
array([[0.        , 0.63961801, 1.53932977, ..., 1.03084131, 4.32943281,
        7.47171915],
       [0.63961801, 0.        , 1.20505141, ..., 1.09780711, 3.76643257,
        7.04893716],
       [1.53932977, 1.20505141, 0.        , ..., 0.84617436, 3.79891055,
        7.4670243 ],
       ...,
       [1.03084131, 1.09780711, 0.84617436, ..., 0.        , 4.44143053,
        7.87905955],
       [4.32943281, 3.76643257, 3.79891055, ..., 4.44143053, 0.        ,
        4.57460318],
       [7.47171915, 7.04893716, 7.4670243 , ..., 7.87905955, 4.57460318,
        0.        ]])

Computing Mahalanobis distance

Mahalanobis_Dist(Data_quant.iloc[0,:], Data_quant.iloc[2,:], S_inv=np.linalg.inv( np.cov(Data_quant , rowvar=False) ))
   0.0
Mahalanobis_Dist(Data_quant.iloc[0,:], Data_quant.iloc[2,:], S_inv=np.linalg.inv( np.cov(Data_quant , rowvar=False) ))
  2.7671855371187757
Mahalanobis_Dist_Matrix(Data_quant)
array([[0.        , 0.92801614, 2.76718554, ..., 1.52541554, 5.21105193,
        6.45997793],
       [0.92801614, 0.        , 1.96135599, ..., 0.98693199, 4.43479282,
        6.2920865 ],
       [2.76718554, 1.96135599, 0.        , ..., 1.3592188 , 3.4307313 ,
        7.27986558],
       ...,
       [1.52541554, 0.98693199, 1.3592188 , ..., 0.        , 4.41360406,
        7.01503103],
       [5.21105193, 4.43479282, 3.4307313 , ..., 4.41360406, 0.        ,
        7.4691448 ],
       [6.45997793, 6.2920865 , 7.27986558, ..., 7.01503103, 7.4691448 ,
        0.        ]])

Computing Sokal similarity

a,b,c,d,p = a_b_c_d_Matrix(Data_binary)
Sokal_Similarity(i=0, r=2, a=a, d=d, p=p)
 1.0
Sokal_Dist(i=0, r=2, a=a, d=d, p=p)
 0.0
Sokal_Dist_Matrix(Data_binary)
array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.81649658],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.81649658],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.81649658],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.81649658],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.81649658],
       [0.81649658, 0.81649658, 0.81649658, ..., 0.81649658, 0.81649658,
        0.        ]])

Computing Jaccard similarity

Jaccard_Similarity(i=0, r=2, a=a, d=d, p=p)
  1.0
Jaccard_Dist(i=0, r=2, a=a, d=d, p=p)
 0.0
Jaccard_Dist_Matrix(Data_binary)
array([[0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.],
       ...,
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.],
       [1., 1., 1., ..., 1., 1., 0.]])

Computing Matching similarity

Matching_Similarity(x_i=Data_multiclass.iloc[0,:], x_r=Data_multiclass.iloc[2,:], Data=Data_multiclass)
0.3333333333333333
Matching_Dist(x_i=Data_multiclass.iloc[0,:], x_r=Data_multiclass.iloc[2,:], Data=Data_multiclass)
   1.1547005383792517
Matching_Dist_Matrix(Data_multiclass)
array([[0.        , 0.81649658, 1.15470054, ..., 0.81649658, 1.15470054,
        1.41421356],
       [0.81649658, 0.        , 1.15470054, ..., 0.        , 1.15470054,
        1.41421356],
       [1.15470054, 1.15470054, 0.        , ..., 1.15470054, 0.81649658,
        1.15470054],
       ...,
       [0.81649658, 0.        , 1.15470054, ..., 0.        , 1.15470054,
        1.41421356],
       [1.15470054, 1.15470054, 0.81649658, ..., 1.15470054, 0.        ,
        1.15470054],
       [1.41421356, 1.41421356, 1.15470054, ..., 1.41421356, 1.15470054,
        0.        ]])

Computing Gower distance

From a theoretical perspective Gower (1971) has been followed.

Gower_Similarity_Matrix(Data, p1=4, p2=3, p3=3)
array([[1.        , 0.85175283, 0.68485131, ..., 0.83008431, 0.62482353,
        0.34709882],
       [0.85175283, 1.        , 0.69489168, ..., 0.94863663, 0.63064768,
        0.35833279],
       [0.68485131, 0.69489168, 1.        , ..., 0.72293677, 0.73120218,
        0.48172501],
       ...,
       [0.83008431, 0.94863663, 0.72293677, ..., 1.        , 0.59776459,
        0.36311382],
       [0.62482353, 0.63064768, 0.73120218, ..., 0.59776459, 1.        ,
        0.55654437],
       [0.34709882, 0.35833279, 0.48172501, ..., 0.36311382, 0.55654437,
        1.        ]])
Gower_Dist_Matrix(Data, p1=4, p2=3, p3=3)
array([[0.        , 0.38502879, 0.56138105, ..., 0.41220831, 0.61251651,
        0.808023  ],
       [0.38502879, 0.        , 0.55236611, ..., 0.22663488, 0.60774363,
        0.80104133],
       [0.56138105, 0.55236611, 0.        , ..., 0.52636796, 0.51845716,
        0.71991318],
       ...,
       [0.41220831, 0.22663488, 0.52636796, ..., 0.        , 0.63422032,
        0.79805149],
       [0.61251651, 0.60774363, 0.51845716, ..., 0.63422032, 0.        ,
        0.66592464],
       [0.808023  , 0.80104133, 0.71991318, ..., 0.79805149, 0.66592464,
        0.        ]])

Computing Robust Mahalanobis distance

From a theoretical perspective Gnanadesikan (1997) and Delvin et al. (1975) have been followed.

Robust_Mahalanobis_Dist(x_i=Data_quant.iloc[0,:], x_r=Data_quant.iloc[2,:], Data=Data_quant, Method='MAD', epsilon=0.05, n_iters=20)
 2.1448247626892223
Robust_Mahalanobis_Dist(x_i=Data_quant.iloc[0,:], x_r=Data_quant.iloc[2,:], Data=Data_quant, Method='trimmed', alpha=0.1, epsilon=0.05, n_iters=20)
 2.7434709885399884
Robust_Mahalanobis_Dist(x_i=Data_quant.iloc[0,:], x_r=Data_quant.iloc[2,:], Data=Data_quant, Method='winsorized', alpha=0.1, epsilon=0.05, n_iters=20)
 2.8446274140577943
Robust_Mahalanobis_Dist_Matrix(Data=Data_quant, Method='trimmed', alpha=0.1, epsilon=0.05, n_iters=20)
array([[ 0.        ,  0.89250845,  2.74347099, ...,  1.48503889,
         5.95276234,  8.49453068],
       [ 0.89250845,  0.        ,  1.99959936, ...,  0.96839524,
         5.33355737,  8.32070442],
       [ 2.74347099,  1.99959936,  0.        , ...,  1.36336733,
         4.12306341,  9.38094479],
       ...,
       [ 1.48503889,  0.96839524,  1.36336733, ...,  0.        ,
         5.1322854 ,  9.00337923],
       [ 5.95276234,  5.33355737,  4.12306341, ...,  5.1322854 ,
         0.        , 11.06785954],
       [ 8.49453068,  8.32070442,  9.38094479, ...,  9.00337923,
        11.06785954,  0.        ]])

Computing Generalized Gower distance and Releted Metric Scaling

To end this tutorial we are going to compute both the Gower distance matrix and the Related Metric Scaling matrix for the mixed data-set Data. And we are going to do that considering all the possible combinations of the quantitative, binary and multiclass distances. Then, we will save all the resulting matrix in a Python dictionary.

From a theoretical perspective we have followed Cuadras and Fortiana (1998), Albarrán et al. (2015) and Grané et al. (2021).

D_GG_list_maha_robust = []
D_RelMS_list_maha_robust = []
D_GG_list_not_maha_robust = []
D_RelMS_list_not_maha_robust = []

d1_list = ['Euclidean', 'Minkowski', 'Canberra', 'Pearson', 'Mahalanobis']
d2_list = ['Sokal', 'Jaccard']
d3_list = ['Matching']
for d in itertools.product(d1_list, d2_list, d3_list) :
    Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1=d[0], d2=d[1], d3=d[2], q=1)
    D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=False)
    D_GG_list_not_maha_robust.append(D)
for d in itertools.product(['Robust_Mahalanobis'], d2_list, d3_list, ['trimmed', 'winsorized', 'MAD']) :
   Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1=d[0], d2=d[1], d3=d[2], epsilon=0.05, Method=d[3], alpha=0.1)
   D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=False)
   D_GG_list_maha_robust.append(D)
for d in itertools.product(d1_list, d2_list, d3_list) :
   Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1=d[0], d2=d[1], d3=d[2], q=1)
   D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=True, tol=0.009, d=2)
   D_RelMS_list_not_maha_robust.append(D)
for d in itertools.product(['Robust_Mahalanobis'], d2_list, d3_list, ['trimmed', 'winsorized', 'MAD']) :
   Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1=d[0], d2=d[1], d3=d[2], epsilon=0.05, Method=d[3], alpha=0.1)
   D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=True, tol=0.009, d=2)
   D_RelMS_list_maha_robust.append(D)
D_GG_list = D_GG_list_not_maha_robust + D_GG_list_maha_robust
D_RelMS_list = D_RelMS_list_not_maha_robust + D_RelMS_list_maha_robust
search_space = [x  for x in D_GG_list] + [x  for x in D_RelMS_list]
distance_names = ['GG_'+x[0]+'_'+x[1]+'_'+x[2]  for x in itertools.product(d1_list, d2_list, d3_list)] + ['GG_'+x[0]+'_'+x[1]+'_'+x[2]+'_'+x[3] for x in itertools.product(['Robust_Mahalanobis'], d2_list, d3_list, ['trimmed', 'winsorized', 'MAD'])] + ['RelMS_'+x[0]+'_'+x[1]+'_'+x[2] for x in itertools.product(d1_list, d2_list, d3_list)] + ['RelMS_'+x[0]+'_'+x[1]+'_'+x[2]+'_'+x[3] for x in itertools.product(['Robust_Mahalanobis'], d2_list, d3_list, ['trimmed', 'winsorized', 'MAD'])]
dic_distance_matrix = dict(zip(distance_names, search_space))
dic_distance_matrix
{'GG_Euclidean_Sokal_Matching': array([[0.        , 1.01161446, 1.60800698, ..., 1.23798333, 1.92432848,
         6.35838514],
        [1.01161446, 0.        , 1.64229596, ..., 0.7889253 , 1.87696727,
         6.29319748],
        [1.60800698, 1.64229596, 0.        , ..., 1.42723912, 2.26882579,
         6.96673669],
        ...,
        [1.23798333, 0.7889253 , 1.42723912, ..., 0.        , 2.4635748 ,
         7.01727531],
        [1.92432848, 1.87696727, 2.26882579, ..., 2.4635748 , 0.        ,
         5.11270638],
        [6.35838514, 6.29319748, 6.96673669, ..., 7.01727531, 5.11270638,
         0.        ]]),
 'GG_Euclidean_Jaccard_Matching': array([[0.        , 1.01161446, 1.60800698, ..., 1.23798333, 1.92432848,
         6.21923207],
        [1.01161446, 0.        , 1.64229596, ..., 0.7889253 , 1.87696727,
         6.15257024],
        [1.60800698, 1.64229596, 0.        , ..., 1.42723912, 2.26882579,
         6.83997121],
        ...,
        [1.23798333, 0.7889253 , 1.42723912, ..., 0.        , 2.4635748 ,
         6.89143953],
        [1.92432848, 1.87696727, 2.26882579, ..., 2.4635748 , 0.        ,
         4.93857798],
        [6.21923207, 6.15257024, 6.83997121, ..., 6.89143953, 4.93857798,
         0.        ]]),
 'GG_Minkowski_Sokal_Matching': array([[0.        , 1.01161589, 1.60801451, ..., 1.23797549, 1.92440501,
         6.35838512],
        [1.01161589, 0.        , 1.64229192, ..., 0.78891568, 1.87702827,
         6.29317915],
        [1.60801451, 1.64229192, 0.        , ..., 1.42723962, 2.2688732 ,
         6.96667937],
        ...,
        [1.23797549, 0.78891568, 1.42723962, ..., 0.        , 2.46364348,
         7.01724763],
        [1.92440501, 1.87702827, 2.2688732 , ..., 2.46364348, 0.        ,
         5.11260609],
        [6.35838512, 6.29317915, 6.96667937, ..., 7.01724763, 5.11260609,
         0.        ]]),
 'GG_Minkowski_Jaccard_Matching': array([[0.        , 1.01161589, 1.60801451, ..., 1.23797549, 1.92440501,
         6.21923205],
        [1.01161589, 0.        , 1.64229192, ..., 0.78891568, 1.87702827,
         6.15255149],
        [1.60801451, 1.64229192, 0.        , ..., 1.42723962, 2.2688732 ,
         6.83991282],
        ...,
        [1.23797549, 0.78891568, 1.42723962, ..., 0.        , 2.46364348,
         6.89141134],
        [1.92440501, 1.87702827, 2.2688732 , ..., 2.46364348, 0.        ,
         4.93847416],
        [6.21923205, 6.15255149, 6.83991282, ..., 6.89141134, 4.93847416,
         0.        ]]),
 'GG_Canberra_Sokal_Matching': array([[0.        , 1.1089173 , 2.04873576, ..., 1.41070641, 2.47064802,
         3.88007815],
        [1.1089173 , 0.        , 1.81887649, ..., 1.10728448, 2.20656591,
         3.66760203],
        [2.04873576, 1.81887649, 0.        , ..., 1.51266848, 2.44536222,
         3.67890583],
        ...,
        [1.41070641, 1.10728448, 1.51266848, ..., 0.        , 2.92569072,
         4.05431191],
        [2.47064802, 2.20656591, 2.44536222, ..., 2.92569072, 0.        ,
         2.67423498],
        [3.88007815, 3.66760203, 3.67890583, ..., 4.05431191, 2.67423498,
         0.        ]]),
 'GG_Canberra_Jaccard_Matching': array([[0.        , 1.1089173 , 2.04873576, ..., 1.41070641, 2.47064802,
         3.64757349],
        [1.1089173 , 0.        , 1.81887649, ..., 1.10728448, 2.20656591,
         3.42068569],
        [2.04873576, 1.81887649, 0.        , ..., 1.51266848, 2.44536222,
         3.43280265],
        ...,
        [1.41070641, 1.10728448, 1.51266848, ..., 0.        , 2.92569072,
         3.83239234],
        [2.47064802, 2.20656591, 2.44536222, ..., 2.92569072, 0.        ,
         2.32407372],
        [3.64757349, 3.42068569, 3.43280265, ..., 3.83239234, 2.32407372,
         0.        ]]),
 'GG_Pearson_Sokal_Matching': array([[0.        , 1.0588577 , 1.62258227, ..., 1.13386485, 2.59878376,
         4.5833716 ],
        [1.0588577 , 0.        , 1.54980561, ..., 0.55073019, 2.36782324,
         4.41160916],
        [1.62258227, 1.54980561, 0.        , ..., 1.48883715, 2.15643298,
         4.46893998],
        ...,
        [1.13386485, 0.55073019, 1.48883715, ..., 0.        , 2.64592015,
         4.75194328],
        [2.59878376, 2.36782324, 2.15643298, ..., 2.64592015, 0.        ,
         3.34753806],
        [4.5833716 , 4.41160916, 4.46893998, ..., 4.75194328, 3.34753806,
         0.        ]]),
 'GG_Pearson_Jaccard_Matching': array([[0.        , 1.0588577 , 1.62258227, ..., 1.13386485, 2.59878376,
         4.38828909],
        [1.0588577 , 0.        , 1.54980561, ..., 0.55073019, 2.36782324,
         4.20857237],
        [1.62258227, 1.54980561, 0.        , ..., 1.48883715, 2.15643298,
         4.26863098],
        ...,
        [1.13386485, 0.55073019, 1.48883715, ..., 0.        , 2.64592015,
         4.56407174],
        [2.59878376, 2.36782324, 2.15643298, ..., 2.64592015, 0.        ,
         3.07502796],
        [4.38828909, 4.20857237, 4.26863098, ..., 4.56407174, 3.07502796,
         0.        ]]),
 'GG_Mahalanobis_Sokal_Matching': array([[0.        , 1.11128701, 1.9908619 , ..., 1.26642065, 2.97833241,
         4.17851469],
        [1.11128701, 0.        , 1.73337267, ..., 0.49510815, 2.64311668,
         4.11353573],
        [1.9908619 , 1.73337267, 0.        , ..., 1.5815777 , 1.99507289,
         4.39053781],
        ...,
        [1.26642065, 0.49510815, 1.5815777 , ..., 0.        , 2.63417571,
         4.3979867 ],
        [2.97833241, 2.64311668, 1.99507289, ..., 2.63417571, 0.        ,
         4.4698317 ],
        [4.17851469, 4.11353573, 4.39053781, ..., 4.3979867 , 4.4698317 ,
         0.        ]]),
 'GG_Mahalanobis_Jaccard_Matching': array([[0.        , 1.11128701, 1.9908619 , ..., 1.26642065, 2.97833241,
         3.96355535],
        [1.11128701, 0.        , 1.73337267, ..., 0.49510815, 2.64311668,
         3.89499193],
        [1.9908619 , 1.73337267, 0.        , ..., 1.5815777 , 1.99507289,
         4.18647921],
        ...,
        [1.26642065, 0.49510815, 1.5815777 , ..., 0.        , 2.63417571,
         4.19429052],
        [2.97833241, 2.64311668, 1.99507289, ..., 2.63417571, 0.        ,
         4.26956454],
        [3.96355535, 3.89499193, 4.18647921, ..., 4.19429052, 4.26956454,
         0.        ]]),
 'GG_Robust_Mahalanobis_Sokal_Matching_trimmed': array([[0.        , 1.0738818 , 1.81990287, ..., 1.17982158, 2.83584093,
         4.38026385],
        [1.0738818 , 0.        , 1.64744788, ..., 0.39866732, 2.61869851,
         4.3233478 ],
        [1.81990287, 1.64744788, 0.        , ..., 1.53344794, 1.97466567,
         4.56660697],
        ...,
        [1.17982158, 0.39866732, 1.53344794, ..., 0.        , 2.54962302,
         4.5492545 ],
        [2.83584093, 2.61869851, 1.97466567, ..., 2.54962302, 0.        ,
         5.16721825],
        [4.38026385, 4.3233478 , 4.56660697, ..., 4.5492545 , 5.16721825,
         0.        ]]),
 'GG_Robust_Mahalanobis_Sokal_Matching_winsorized': array([[0.        , 1.10035027, 1.96521318, ..., 1.24876507, 3.02193061,
         4.2158267 ],
        [1.10035027, 0.        , 1.72244788, ..., 0.45786845, 2.71169847,
         4.170886  ],
        [1.96521318, 1.72244788, 0.        , ..., 1.57396145, 2.01907767,
         4.45138733],
        ...,
        [1.24876507, 0.45786845, 1.57396145, ..., 0.        , 2.6589383 ,
         4.42575055],
        [3.02193061, 2.71169847, 2.01907767, ..., 2.6589383 , 0.        ,
         4.74960743],
        [4.2158267 , 4.170886  , 4.45138733, ..., 4.42575055, 4.74960743,
         0.        ]]),
 'GG_Robust_Mahalanobis_Sokal_Matching_MAD': array([[0.        , 1.09006233, 1.80375514, ..., 1.18201607, 2.67497233,
         4.55678538],
        [1.09006233, 0.        , 1.62058379, ..., 0.44488228, 2.40606721,
         4.40232615],
        [1.80375514, 1.62058379, 0.        , ..., 1.53278692, 1.93813141,
         4.46679441],
        ...,
        [1.18201607, 0.44488228, 1.53278692, ..., 0.        , 2.48916367,
         4.64371521],
        [2.67497233, 2.40606721, 1.93813141, ..., 2.48916367, 0.        ,
         4.16671594],
        [4.55678538, 4.40232615, 4.46679441, ..., 4.64371521, 4.16671594,
         0.        ]]),
 'GG_Robust_Mahalanobis_Jaccard_Matching_trimmed': array([[0.        , 1.0738818 , 1.81990287, ..., 1.17982158, 2.83584093,
         4.17570322],
        [1.0738818 , 0.        , 1.64744788, ..., 0.39866732, 2.61869851,
         4.11595944],
        [1.81990287, 1.64744788, 0.        , ..., 1.53344794, 1.97466567,
         4.37077626],
        ...,
        [1.17982158, 0.39866732, 1.53344794, ..., 0.        , 2.54962302,
         4.35264315],
        [2.83584093, 2.61869851, 1.97466567, ..., 2.54962302, 0.        ,
         4.99499053],
        [4.17570322, 4.11595944, 4.37077626, ..., 4.35264315, 4.99499053,
         0.        ]]),
 'GG_Robust_Mahalanobis_Jaccard_Matching_winsorized': array([[0.        , 1.10035027, 1.96521318, ..., 1.24876507, 3.02193061,
         4.00287155],
        [1.10035027, 0.        , 1.72244788, ..., 0.45786845, 2.71169847,
         3.95551209],
        [1.96521318, 1.72244788, 0.        , ..., 1.57396145, 2.01907767,
         4.25025118],
        ...,
        [1.24876507, 0.45786845, 1.57396145, ..., 0.        , 2.6589383 ,
         4.22339365],
        [3.02193061, 2.71169847, 2.01907767, ..., 2.6589383 , 0.        ,
         4.5616397 ],
        [4.00287155, 3.95551209, 4.25025118, ..., 4.22339365, 4.5616397 ,
         0.        ]]),
 'GG_Robust_Mahalanobis_Jaccard_Matching_MAD': array([[0.        , 1.09006233, 1.80375514, ..., 1.18201607, 2.67497233,
         4.36051361],
        [1.09006233, 0.        , 1.62058379, ..., 0.44488228, 2.40606721,
         4.19884049],
        [1.80375514, 1.62058379, 0.        , ..., 1.53278692, 1.93813141,
         4.26638468],
        ...,
        [1.18201607, 0.44488228, 1.53278692, ..., 0.        , 2.48916367,
         4.45127812],
        [2.67497233, 2.40606721, 1.93813141, ..., 2.48916367, 0.        ,
         3.95111474],
        [4.36051361, 4.19884049, 4.26638468, ..., 4.45127812, 3.95111474,
         0.        ]]),
 'RelMS_Euclidean_Sokal_Matching': array([[0.        , 1.01092438, 1.68587263, ..., 1.2435966 , 1.75479379,
         5.76354972],
        [1.01092436, 0.        , 1.72123768, ..., 0.78892531, 1.71977376,
         5.69924943],
        [1.68587264, 1.7212377 , 0.        , ..., 1.42997022, 2.20660915,
         6.5504967 ],
        ...,
        [1.24359658, 0.78892532, 1.42997021, ..., 0.        , 2.26671431,
         6.42377887],
        [1.7547938 , 1.71977375, 2.20660914, ..., 2.26671431, 0.        ,
         4.781135  ],
        [5.76354972, 5.69924943, 6.55049671, ..., 6.42377887, 4.78113499,
         0.        ]]),
 'RelMS_Euclidean_Jaccard_Matching': array([[0.        , 1.01092435, 1.68587263, ..., 1.24359659, 1.75479381,
         5.73873464],
        [1.01092437, 0.        , 1.72123769, ..., 0.78892532, 1.71977378,
         5.67208311],
        [1.68587264, 1.72123769, 0.        , ..., 1.42997021, 2.20660914,
         6.53309456],
        ...,
        [1.24359658, 0.78892529, 1.42997021, ..., 0.        , 2.26671431,
         6.41402297],
        [1.7547938 , 1.71977375, 2.20660914, ..., 2.2667143 , 0.        ,
         4.6957284 ],
        [5.73873463, 5.67208312, 6.53309457, ..., 6.41402297, 4.69572838,
         0.        ]]),
 'RelMS_Minkowski_Sokal_Matching': array([[0.        , 1.0104344 , 1.68473307, ..., 1.24302039, 1.75451827,
         5.7636572 ],
        [1.01043437, 0.        , 1.72039524, ..., 0.78891568, 1.71978231,
         5.69946617],
        [1.68473308, 1.72039525, 0.        , ..., 1.42922921, 2.20651554,
         6.55109162],
        ...,
        [1.24302037, 0.7889157 , 1.4292292 , ..., 0.        , 2.2667207 ,
         6.42402052],
        [1.75451827, 1.71978229, 2.20651553, ..., 2.2667207 , 0.        ,
         4.78235997],
        [5.7636572 , 5.69946616, 6.55109161, ..., 6.42402052, 4.78235997,
         0.        ]]),
 'RelMS_Minkowski_Jaccard_Matching': array([[0.        , 1.01043437, 1.68473307, ..., 1.24302038, 1.75451828,
         5.73875343],
        [1.01043439, 0.        , 1.72039525, ..., 0.78891569, 1.71978232,
         5.67221733],
        [1.68473307, 1.72039524, 0.        , ..., 1.4292292 , 2.20651553,
         6.5336026 ],
        ...,
        [1.24302038, 0.78891568, 1.4292292 , ..., 0.        , 2.2667207 ,
         6.41417732],
        [1.75451828, 1.7197823 , 2.20651553, ..., 2.2667207 , 0.        ,
         4.6969009 ],
        [5.73875342, 5.67221732, 6.5336026 , ..., 6.41417732, 4.6969009 ,
         0.        ]]),
 'RelMS_Canberra_Sokal_Matching': array([[0.        , 3.29475825, 3.63767326, ..., 3.42002989, 3.78234978,
         4.28387746],
        [3.29475817, 0.        , 3.54627477, ..., 3.36365755, 3.64707779,
         4.11290306],
        [3.63767327, 3.5462748 , 0.        , ..., 3.36371231, 3.88636668,
         4.26421609],
        ...,
        [3.42002989, 3.36365756, 3.36371231, ..., 0.        , 4.08835735,
         4.43146723],
        [3.78234979, 3.64707779, 3.88636667, ..., 4.08835736, 0.        ,
         3.55682862],
        [4.28387745, 4.11290305, 4.26421607, ..., 4.43146723, 3.55682862,
         0.        ]]),
 'RelMS_Canberra_Jaccard_Matching': array([[0.        , 3.29475816, 3.63767325, ..., 3.42002988, 3.7823498 ,
         4.18398249],
        [3.29475818, 0.        , 3.54627479, ..., 3.36365756, 3.64707782,
         4.00084943],
        [3.63767326, 3.54627478, 0.        , ..., 3.36371229, 3.88636666,
         4.15092751],
        ...,
        [3.42002988, 3.36365755, 3.36371228, ..., 0.        , 4.08835736,
         4.3378168 ],
        [3.78234979, 3.64707778, 3.88636666, ..., 4.08835735, 0.        ,
         3.36218137],
        [4.18398248, 4.00084941, 4.15092752, ..., 4.3378168 , 3.36218137,
         0.        ]]),
 'RelMS_Pearson_Sokal_Matching': array([[0.        , 1.04250916, 1.57029271, ..., 1.11835441, 2.35030151,
         3.99961285],
        [1.04250913, 0.        , 1.55642417, ..., 0.55073019, 2.17276224,
         3.83629275],
        [1.5702927 , 1.55642418, 0.        , ..., 1.44481248, 2.11094744,
         4.05200057],
        ...,
        [1.11835439, 0.55073021, 1.44481248, ..., 0.        , 2.43447697,
         4.16544183],
        [2.35030151, 2.17276223, 2.11094745, ..., 2.43447697, 0.        ,
         3.00502738],
        [3.99961283, 3.83629274, 4.05200056, ..., 4.16544183, 3.00502738,
         0.        ]]),
 'RelMS_Pearson_Jaccard_Matching': array([[0.        , 1.04250913, 1.57029271, ..., 1.11835441, 2.35030152,
         3.89789603],
        [1.04250915, 0.        , 1.55642418, ..., 0.55073023, 2.17276226,
         3.72479069],
        [1.5702927 , 1.55642415, 0.        , ..., 1.44481247, 2.11094744,
         3.94329467],
        ...,
        [1.11835439, 0.55073016, 1.44481248, ..., 0.        , 2.43447698,
         4.07654071],
        [2.35030152, 2.17276223, 2.11094745, ..., 2.43447697, 0.        ,
         2.77842982],
        [3.89789601, 3.72479067, 3.94329467, ..., 4.0765407 , 2.77842982,
         0.        ]]),
 'RelMS_Mahalanobis_Sokal_Matching': array([[0.        , 1.0872495 , 1.91566724, ..., 1.23718333, 2.78694322,
         3.59368169],
        [1.08724948, 0.        , 1.72190382, ..., 0.49510814, 2.51013925,
         3.52430362],
        [1.91566725, 1.72190383, 0.        , ..., 1.53860587, 1.97114821,
         3.91897956],
        ...,
        [1.23718333, 0.49510818, 1.53860586, ..., 0.        , 2.47401146,
         3.7944967 ],
        [2.78694323, 2.51013924, 1.97114821, ..., 2.47401146, 0.        ,
         4.10401609],
        [3.59368167, 3.52430361, 3.91897955, ..., 3.7944967 , 4.10401609,
         0.        ]]),
 'RelMS_Mahalanobis_Jaccard_Matching': array([[0.        , 1.08724947, 1.91566724, ..., 1.23718333, 2.78694323,
         3.46907215],
        [1.0872495 , 0.        , 1.72190383, ..., 0.49510817, 2.51013926,
         3.39550188],
        [1.91566724, 1.72190381, 0.        , ..., 1.53860586, 1.97114821,
         3.80535063],
        ...,
        [1.23718333, 0.49510812, 1.53860586, ..., 0.        , 2.47401147,
         3.68911387],
        [2.78694323, 2.51013924, 1.97114821, ..., 2.47401147, 0.        ,
         3.96214705],
        [3.46907213, 3.39550187, 3.80535063, ..., 3.68911387, 3.96214705,
         0.        ]]),
 'RelMS_Robust_Mahalanobis_Sokal_Matching_trimmed': array([[0.        , 1.05396495, 1.74951184, ..., 1.15390312, 2.67058462,
         3.82780883],
        [1.05396493, 0.        , 1.63479812, ..., 0.39866731, 2.51224528,
         3.76362714],
        [1.74951185, 1.63479814, 0.        , ..., 1.49657109, 1.961588  ,
         4.09825745],
        ...,
        [1.15390311, 0.39866735, 1.49657109, ..., 0.        , 2.41854434,
         3.97375586],
        [2.67058463, 2.51224527, 1.961588  , ..., 2.41854434, 0.        ,
         4.81269468],
        [3.82780882, 3.76362713, 4.09825744, ..., 3.97375586, 4.81269468,
         0.        ]]),
 'RelMS_Robust_Mahalanobis_Sokal_Matching_winsorized': array([[0.        , 1.07688717, 1.88851059, ..., 1.21940102, 2.83800382,
         3.64003684],
        [1.07688713, 0.        , 1.70819251, ..., 0.45786842, 2.58662722,
         3.59029333],
        [1.8885106 , 1.70819253, 0.        , ..., 1.53220354, 1.99808026,
         3.97860895],
        ...,
        [1.21940101, 0.45786849, 1.53220353, ..., 0.        , 2.50787408,
         3.829693  ],
        [2.83800382, 2.58662721, 1.99808026, ..., 2.50787408, 0.        ,
         4.38739858],
        [3.64003683, 3.59029333, 3.97860894, ..., 3.829693  , 4.38739858,
         0.        ]]),
 'RelMS_Robust_Mahalanobis_Sokal_Matching_MAD': array([[0.        , 1.06915308, 1.73228661, ..., 1.15789936, 2.45834684,
         3.97049139],
        [1.06915305, 0.        , 1.61195487, ..., 0.44488227, 2.24973009,
         3.81621214],
        [1.73228661, 1.61195488, 0.        , ..., 1.4894837 , 1.90536576,
         4.00431571],
        ...,
        [1.15789934, 0.44488231, 1.4894837 , ..., 0.        , 2.30824179,
         4.04102682],
        [2.45834685, 2.24973009, 1.90536577, ..., 2.30824178, 0.        ,
         3.79967402],
        [3.97049139, 3.81621213, 4.0043157 , ..., 4.04102682, 3.79967402,
         0.        ]]),
 'RelMS_Robust_Mahalanobis_Jaccard_Matching_trimmed': array([[0.        , 1.05396492, 1.74951184, ..., 1.15390312, 2.67058463,
         3.7103996 ],
        [1.05396495, 0.        , 1.63479813, ..., 0.39866734, 2.51224529,
         3.64245313],
        [1.74951185, 1.63479812, 0.        , ..., 1.49657109, 1.961588  ,
         3.98729219],
        ...,
        [1.15390311, 0.39866728, 1.49657109, ..., 0.        , 2.41854435,
         3.87035377],
        [2.67058464, 2.51224527, 1.961588  , ..., 2.41854434, 0.        ,
         4.69932707],
        [3.71039959, 3.64245311, 3.9872922 , ..., 3.87035377, 4.69932707,
         0.        ]]),
 'RelMS_Robust_Mahalanobis_Jaccard_Matching_winsorized': array([[0.        , 1.07688714, 1.88851059, ..., 1.21940102, 2.83800383,
         3.51619033],
        [1.07688715, 0.        , 1.70819252, ..., 0.45786846, 2.58662723,
         3.46347473],
        [1.88851059, 1.70819251, 0.        , ..., 1.53220354, 1.99808026,
         3.86606614],
        ...,
        [1.219401  , 0.45786843, 1.53220353, ..., 0.        , 2.50787409,
         3.72394257],
        [2.83800382, 2.58662721, 1.99808026, ..., 2.50787408, 0.        ,
         4.25828147],
        [3.51619032, 3.46347472, 3.86606614, ..., 3.72394256, 4.25828147,
         0.        ]]),
 'RelMS_Robust_Mahalanobis_Jaccard_Matching_MAD': array([[0.        , 1.06915304, 1.73228661, ..., 1.15789935, 2.45834686,
         3.86694579],
        [1.06915307, 0.        , 1.61195488, ..., 0.4448823 , 2.24973011,
         3.7045599 ],
        [1.7322866 , 1.61195486, 0.        , ..., 1.48948369, 1.90536575,
         3.89571711],
        ...,
        [1.15789934, 0.44488225, 1.48948369, ..., 0.        , 2.30824179,
         3.9478467 ],
        [2.45834686, 2.24973009, 1.90536576, ..., 2.30824179, 0.        ,
         3.64285626],
        [3.86694578, 3.70455988, 3.8957171 , ..., 3.9478467 , 3.64285626,
         0.        ]])}

Computational Cost Testing

In this case, we are going to use the entire House_Price.csv dataset, which has 1905 rows, to perform a computational cost test (in terms of time) of the new distance metrics included in PyDistances.

Data = pd.read_csv('House_Price.csv')
Data = Data.loc[:, ['latitude', 'longitude', 'price', 'size_in_m_2', 'balcony_recode', 'private_garden_recode', 'private_gym_recode', 'quality_recode', 'no_of_bathrooms', 'no_of_bedrooms']]
Data.shape
(1905, 10)
Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1='Robust_Mahalanobis', d2='Jaccard', d3='Matching', epsilon=0.05, Method='trimmed', alpha=0.1)
D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=False)

# Time: 1.11 minutes.
Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1='Robust_Mahalanobis', d2='Jaccard', d3='Matching', epsilon=0.05, Method='winsorized', alpha=0.1)
D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=False)

# Time: 1.15 minutes.
Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1='Robust_Mahalanobis', d2='Jaccard', d3='Matching', epsilon=0.05, Method='MAD', alpha=0.1)
D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=False)

# Time: 1.12 minutes.
Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1='Robust_Mahalanobis', d2='Jaccard', d3='Matching', epsilon=0.05, Method='trimmed', alpha=0.1)
D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=True)

# Time: 1.58 minutes.
Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1='Robust_Mahalanobis', d2='Jaccard', d3='Matching', epsilon=0.05, Method='winsorized', alpha=0.1)
D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=True)

# Time: 1.53 minutes.
Generalized_Gower_Distance_init = GeneralizedGowerDistance(Data=Data, p1=4, p2=3, p3=3, d1='Robust_Mahalanobis', d2='Jaccard', d3='Matching', epsilon=0.05, Method='MAD', alpha=0.1)
D, D_2 = Generalized_Gower_Distance_init.compute(Related_Metric_Scaling=True)

# Time: 1.55 minutes.

We can compare these times with the one obtained by (simple) Gower distance.

Gower_Dist_Matrix(Data, p1=4, p2=3, p3=3)

# Time: 38 seconds.

Bibliography

Albarrán, I., P. Alonso, and A. Grané “Profile Identification via Weighted Related Metric Scaling: An Application to Dependent Spanish Children.” Journal of the Royal Statistical Society. Series A, Statistics in Society 178, no. 3 (2015): 593–618. https://doi.org/10.1111/rssa.12084stex:B88856BB540BB0134A72028E02D7B00CBED08217.

Cuadras, C. M., and J. Fortiana. “Chapter 25 - Visualizing Categorical Data with Related Metric Scaling.” In Visualization of Categorical Data, 365–76. Academic Press, 1998. https://doi.org/10.1016/B978-012299045-8/50028-0.

Devlin, S. J., R. Gnanadesikan, and J. R. Kettenring. “Robust Estimation and Outlier Detection with Correlation Coefficients.” Biometrika 62, no. 3 (1975): 531–45. https://doi.org/10.1093/biomet/62.3.531.

Grané, A., Manzi G. and S. Salini. "Smart Visualization of Mixed Data". Stats n.º 4 (2021): 472–485. https://doi.org/10.3390/stats4020029

Gower, J. C. “A General Coefficient of Similarity and Some of Its Properties.” Biometrics 27, no. 4 (1971): 857–71. https://doi.org/10.2307/2528823.

Gnanadesikan, R. Methods for Statistical Data Analysis of Multivariate Observations. 2nd ed. New York etc.: : John Wiley and Sons, 1997.