Python Scipy Distance Matrix

In this Python tutorial, we will learn about the “Python Scipy Distance Matrix” where we will calculate the distance between matrices or arrays using the different distance methods like euclidean, manhattan, etc, and cover the following topics.

  • Python Scipy Distance Matrix
  • Python Scipy Distance Matrix Cdist
  • Python Scipy Distance Matrix Pdist
  • Python Scipy Distance Matrix Euclidean
  • Python Scipy Distance Matrix Cityblock
  • Python Scipy Distance Matrix Clustering
  • Python Scipy Distance Matrix Directed Hausdorff
  • Python Scipy Distance Matrix Cosine
  • Python Scipy Distance Correlation Matrix

Also, check the latest tutorial on Python SciPy: Python Scipy Stats Kurtosis

Python Scipy Distance Matrix

The distances between the vectors of matrix/matrices that were calculated pairwise are contained in a distance matrix. We may compute the distance matrix using the distance_matrix() method provided by the scipy.spatial module. In most cases, matrices have the shape of a 2-D array, with matrix rows serving as the matrix’s vectors ( one-dimensional array).

The syntax is given below.

scipy.spatial.distance_matrix(x, y, threshold=1000000, p=2)

Where parameters are:

  • x(array_data(m,k): K-dimensional matrix with M vectors.
  • y(array_data(n,k)): N by K-dimensional matrix of vectors.
  • threshold(positive int): The algorithm uses a Python loop rather than large temporary arrays if M * N * K is greater than the threshold.
  • p(float): which Minkowski p-norm should be applied.

The method distance_matrix() returns a matrix that measures the separation between each vector in x and each vector in y of type ndarray.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

from scipy import spatial
import numpy as np

Create matrices using the below code.

x_mat = np.array([[2,3],[1,2],[3,3]])
y_mat = np.array([[1,2],[2,0],[5,0]])

Calculate the distance matrix using the below code.

d_matrix = spatial.distance_matrix(x_mat,y_mat,p=2)

View the distance matrix using the below code.

print(d_matrix)
Python Scipy Distance Matrix
Python Scipy Distance Matrix

This is how to compute the distance matrix using the method distance_matrix() of module Python scipy.spatial.

Also read: Python Scipy Exponential

Python Scipy Distance Matrix Cdist

The Python Scipy contains a method cdist() in a module scipy.spatial.distance that calculates the distance between each pair of the two input collections.

The syntax is given below.

scipy.spatial.distance.cdist(XA, XB, metric='cosine')

Where parameters are:

  • XA(array_data): An array of original mB observations in n dimensions, each measuring mB by n.
  • XB(array_data): an array of original mB observations in n dimensions, each measuring mB by n.
  • metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method cdist() returns the result that is a distance matrix of size mA by mB.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import cdist

Create data using the below code.

coord_data = [(25.056, -75.7226),
          (25.7411, -79.1197),
          (25.2897, -79.2294),
          (25.6716, -79.3378)]

Between four two-dimension coordinates, calculate their Euclidean distances.

cdist(coord_data,coord_data,'euclidean')
Python Scipy Distance Matrix Cdist
Python Scipy Distance Matrix Cdist

This is how to use the method cdist() of Python Scipy to calculate the distance between each pair of the two input collections.

Read: Python Scipy Chi-Square Test

Python Scipy Distance Matrix Pdist

The Python Scipy contains a method pdist() in a module scipy.spatial.distance that calculates the pairwise distances in n-dimensional space between observations.

The syntax is given below.

scipy.spatial.distance.pdist(X, metric='minkowski)

Where parameters are:

  • X(array_data): An array of m unique observations in n dimensions, arranged m by n.
  • metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method pdist() returns Y(which is a compressed distance matrix Y) of type ndarray.

Let’s understand with illustrations by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create data using the below code.

data = [(25.056, -75.7226),
          (25.7411, -79.1197),
          (25.2897, -79.2294),
          (25.6716, -79.3378)]

Use the Euclidean distance (2-norm) as the distance metric between the points to calculate the distance between m points. In matrix data, the points are arranged as m n-dimensional row vectors.

pdist(data,'euclidean')
Python Scipy Distance Matrix Pdist

This how-to computes the pairwise distances in n-dimensional space between observations using the pdist() of Python Scipy

Read: Scipy Find Peaks – Useful Tutorial

Python Scipy Distance Matrix Euclidean

The Python Scipy contains a method euclidean() in a module scipy.spatial.distance that calculates the Euclidean distance between 2 one-dimensional arrays.

The syntax is given below.

scipy.spatial.distance.euclidean(u, v, w=None)

Where parameters are:

  • u(array_data): Input matrix or array.
  • v(array_data): Input matrix or array.
  • w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.

The method euclidean() returns euclidean(which is the euclidean distance of the two vectors, u and v) of type double.

Let’s do some examples by following the below steps:

Import the method euclidean() and compute the distance using the below python code.

from scipy.spatial.distance import euclidean
euclidean([2, 1, 0], [1, 1, 2])
Python Scipy Distance Matrix Euclidean
Python Scipy Distance Matrix Euclidean

Look at the above output, the euclidean distance of the given array is 2.236.

This is how to compute the euclidean distance using the method euclidean() of Python Scipy.

Read: Python Scipy Special

Python Scipy Distance Matrix Cityblock

The Python Scipy module scipy.spatial.distance contains a method cityblock() that calculates the Manhattan distance and also known as City Block distance.

The syntax is given below.

scipy.spatial.distance.cityblock(u, v, w=None)

Where parameters are:

  • u(array_data): Input matrix or array.
  • v(array_data): Input matrix or array.
  • w(array_data): Each value’s weights in u and v. When None is selected as the default, each value weights 1.

The method cityblokc() returns cityblock(which is the Manhattan (cityblock) distance of the two vectors, u and v) of type double.

Let’s do some examples by following the below steps:

Import the method cityblok() and compute Manhattan the distance using the below python code.

from scipy.spatial.distance import cityblock
cityblock([1, 0, 2], [2, 1, 2])
Python Scipy Distance Matrix Cityblock
Python Scipy Distance Matrix Cityblock

Look at the above output, the cityblock (Manhattan) distance of the given array is 2.

This is how to compute the cityblock distance using the method cityblock() of Python Scipy.

Read: Python Scipy Matrix + Examples

Python Scipy Distance Matrix Clustering

The Python Scipy module scipy.spatial.hierarchy contains a method linkage() that cluster data using hierarchical or agglomerative methods. But “What is hierarchical clustering?”

  • An algorithm called hierarchical clustering, commonly referred to as hierarchical cluster analysis, divides objects into clusters based on how similar they are. The result is a collection of clusters, each of which differs from the others while having things that are generally similar to one another.
Python Scipy Distance Matrix Clustering
Python Scipy Distance Matrix Clustering

The syntax is given below.

scipy.cluster.hierarchy.linkage(y, method='complete', optimal_ordering=False, metric='euclidean')

Where parameters are:

  • y(ndarray): a compressed distance matrix. The upper triangle of the distance matrix is included in a flat array known as a condensed distance matrix. Pdist returns this form as a result. An m by n array can also be used to pass a set of m observation vectors in n dimensions. There cannot be any NaNs or infs in the condensed distance matrix, all elements must be finite.
  • method(str, function): The appropriate linkage algorithm to use.
  • optimal_ordering(boolean): If True, the connection matrix will be rearranged to have the shortest possible distance between succeeding leaves. When the data are visualized, this leads to a more understandable tree structure. defaults to False due to the algorithm’s potential for slow performance, especially with large datasets.
  • metric(str): If y is a collection of observation vectors, then use this distance metric; otherwise, disregard it. For a list of acceptable distance measurements, refer to the pdist function. Additionally, a custom distance function is available.

The method linkage() returns z(The linking matrix is used to encode the hierarchical grouping) of type ndarray.

Import the required libraries using the below python code.

from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

Create data on which we want to perform the clustering using the below code.

data = [[i] for i in [1, 7, 1, 5, 2, 8, 8, 1]]

Pass the above-created data to a method linkage() with ward to compute the clustering using the below code.

res = linkage(data, 'ward')

Plot the dendrogram of the above result using the below code.

fig = plt.figure(figsize=(5, 5))
dendo = dendrogram(res)

In the above, code we have passed the result from the method linkage(data,'ward') to a method dendrogram(res) to show the dendrogram of the get the clusters.

Python Scipy Distance Matrix Clustering
Python Scipy Distance Matrix Clustering

This is how to calculate the distance matrix clustering using the method linkage() of Python Scipy.

Read: Scipy Stats Zscore + Examples

Python Scipy Distance Matrix Directed Hausdorff

The Python Scipy module scipy.spatial.distance contains a method directed_hausdorff() that finds the directed hausdorff distance between two two-dimensional arrays.

The syntax is given below.

scipy.spatial.distance.directed_hausdorff(u, v, seed=2)

Where parameters are:

  • u(array_data, (M,N)): Input matrix or array.
  • v(array_data, (ON)): Input matrix or array.
  • seed(int): The default value of 0 ensures reproducibility by randomly rearranging the values of u and v.

The method directed_hausdorff() returns d(The arrays u and v’s directed Hausdorff distance), index_1(index of the point in u that contributes to the Hausdorff pair), and index_2(index of the point in v that contributes to the Hausdorff pair) of type double, int, and int respectively.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial import distance
import numpy as np

Create two 2-dimensional arrays using the below code.

u_arr = np.array([(2.0, 0.0),
              (0.0, 2.0),
              (-2.0, 0.0),
              (0.0, -2.0)])
v_arr = np.array([(1.0, 0.0),
              (0.0, 1.0),
              (-1.0, 0.0),
              (0.0, -2.0)])

Compute the directed Hausdorff distance of the above-created array using the below code.

distance.directed_hausdorff(u_arr, v_arr)[0]
Python Scipy Distance Matrix Directed Hausdorff
Python Scipy Distance Matrix Directed Hausdorff

Look at the above output, the directed Hausdorff distance between given arrays is 1.

This is how to compute the directed Hausdorff distance using the method directed_hausdorff() of Python Scipy.

Read: Scipy Convolve – Complete Guide

Python Scipy Distance Matrix Cosine

The Python Scipy module scipy.spatial.distance contains a method cosine() that identify the cosine distance of one-dimensional arrays.

Assume that u and v have a cosine distance of :

Python Scipy Distance Matrix Cosine
Python Scipy Distance Matrix Cosine

The syntax is given below.

scipy.spatial.distance.cosine(u, v, w=None)

Where parameters are:

  • u(array_data): Input matrix or array.
  • v(array_data): Input matrix or array.
  • w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.

The method cosine() returns cosine(which is u and v’s cosine distance from one another) of type double.

Let’s take an example by following the below steps:

Import the method cosine() and compute the distance using the below python code.

from scipy.spatial.distance import cosine
cosine([0, 1, 1], [1, 0, 1])
Python Scipy Distance Matrix Cosine
Python Scipy Distance Matrix Cosine

Look at the above output, the cosine distance of the given array is 0.5.

This is how to compute the cosine distance using the method cosine() of Python Scipy.

Read: Scipy Integrate + Examples

Python Scipy Distance Correlation Matrix

The Python Scipy has a method correlation() in a module scipy.spatial.distance that identify the correlation distance in the middle of two one-dimensional arrays.

Assume that u and v have a correlation distance of :

Python Scipy Distance Correlation Matrix
Python Scipy Distance Correlation Matrix

The syntax is given below.

scipy.spatial.distance.correlation(u, v, w, centered = False)
  • u(array_data): Input matrix or array.
  • v(array_data): Input matrix or array.
  • w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.
  • centered(boolean): If accurate, u and v will be in the middle. True by default.

The method correlation() returns correlation(which is the separation of one-dimensional arrays u and v in terms of correlation) of type double.

Let’s take an example by following the below steps:

Import the method correlation() and compute the distance using the below python code.

from scipy.spatial.distance import correlation
correlation([0, 1, 1], [1, 0, 1])
Python Scipy Distance Correlation Matrix
Python Scipy Distance Correlation Matrix

Look at the above output, the correlation distance of the given array is 1.5.

This is how to compute the correlation distance using the method correlation() of Python Scipy.

You may also like to read the following Python Scipy tutorials.

So in this tutorial, we have learned about the “Python Scipy Distance Matrix” and covered the following topics.

  • Python Scipy Distance Matrix
  • Python Scipy Distance Matrix Cdist
  • Python Scipy Distance Matrix Pdist
  • Python Scipy Distance Matrix Euclidean
  • Python Scipy Distance Matrix Cityblock
  • Python Scipy Distance Matrix Clustering
  • Python Scipy Distance Matrix Directed Hausdorff
  • Python Scipy Distance Matrix Cosine
  • Python Scipy Distance Correlation Matrix