# Python Scipy Distance Matrix

In this Python tutorial, we will learn about the “Python Scipy Distance Matrix” where we will calculate the distance between matrices or arrays using the different distance methods like euclidean, manhattan, etc, and cover the following topics.

• Python Scipy Distance Matrix
• Python Scipy Distance Matrix Cdist
• Python Scipy Distance Matrix Pdist
• Python Scipy Distance Matrix Euclidean
• Python Scipy Distance Matrix Cityblock
• Python Scipy Distance Matrix Clustering
• Python Scipy Distance Matrix Directed Hausdorff
• Python Scipy Distance Matrix Cosine
• Python Scipy Distance Correlation Matrix

Also, check the latest tutorial on Python SciPy: Python Scipy Stats Kurtosis

## Python Scipy Distance Matrix

The distances between the vectors of matrix/matrices that were calculated pairwise are contained in a distance matrix. We may compute the distance matrix using the `distance_matrix()` method provided by the `scipy.spatial` module. In most cases, matrices have the shape of a 2-D array, with matrix rows serving as the matrix’s vectors ( one-dimensional array).

The syntax is given below.

``scipy.spatial.distance_matrix(x, y, threshold=1000000, p=2)``

Where parameters are:

• x(array_data(m,k): K-dimensional matrix with M vectors.
• y(array_data(n,k)): N by K-dimensional matrix of vectors.
• threshold(positive int): The algorithm uses a Python loop rather than large temporary arrays if M * N * K is greater than the threshold.
• p(float): which Minkowski p-norm should be applied.

The method `distance_matrix()` returns a matrix that measures the separation between each vector in x and each vector in y of type ndarray.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

``````from scipy import spatial
import numpy as np``````

Create matrices using the below code.

``````x_mat = np.array([[2,3],[1,2],[3,3]])
y_mat = np.array([[1,2],[2,0],[5,0]])``````

Calculate the distance matrix using the below code.

``d_matrix = spatial.distance_matrix(x_mat,y_mat,p=2)``

View the distance matrix using the below code.

``print(d_matrix)``

This is how to compute the distance matrix using the method `distance_matrix()` of module Python `scipy.spatial`.

## Python Scipy Distance Matrix Cdist

The Python Scipy contains a method `cdist()` in a module `scipy.spatial.distance` that calculates the distance between each pair of the two input collections.

The syntax is given below.

``scipy.spatial.distance.cdist(XA, XB, metric='cosine')``

Where parameters are:

• XA(array_data): An array of original mB observations in n dimensions, each measuring mB by n.
• XB(array_data): an array of original mB observations in n dimensions, each measuring mB by n.
• metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method `cdist()` returns the result that is a distance matrix of size mA by mB.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

``from scipy.spatial.distance import cdist``

Create data using the below code.

``````coord_data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]``````

Between four two-dimension coordinates, calculate their Euclidean distances.

``cdist(coord_data,coord_data,'euclidean')``

This is how to use the method `cdist()` of Python Scipy to calculate the distance between each pair of the two input collections.

## Python Scipy Distance Matrix Pdist

The Python Scipy contains a method `pdist()` in a module `scipy.spatial.distance` that calculates the pairwise distances in n-dimensional space between observations.

The syntax is given below.

``scipy.spatial.distance.pdist(X, metric='minkowski)``

Where parameters are:

• X(array_data): An array of m unique observations in n dimensions, arranged m by n.
• metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method `pdist()` returns `Y`(which is a compressed distance matrix Y) of type ndarray.

Let’s understand with illustrations by following the below steps:

Import the required libraries using the below python code.

``from scipy.spatial.distance import pdist``

Create data using the below code.

``````data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]``````

Use the Euclidean distance (2-norm) as the distance metric between the points to calculate the distance between m points. In matrix data, the points are arranged as m n-dimensional row vectors.

``pdist(data,'euclidean')``

This how-to computes the pairwise distances in n-dimensional space between observations using the `pdist()` of Python Scipy

## Python Scipy Distance Matrix Euclidean

The Python Scipy contains a method `euclidean()` in a module `scipy.spatial.distance` that calculates the Euclidean distance between 2 one-dimensional arrays.

The syntax is given below.

``scipy.spatial.distance.euclidean(u, v, w=None)``

Where parameters are:

• u(array_data): Input matrix or array.
• v(array_data): Input matrix or array.
• w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.

The method `euclidean()` returns `euclidean`(which is the euclidean distance of the two vectors, u and v) of type double.

Let’s do some examples by following the below steps:

Import the method `euclidean()` and compute the distance using the below python code.

``````from scipy.spatial.distance import euclidean
euclidean([2, 1, 0], [1, 1, 2])``````

Look at the above output, the euclidean distance of the given array is 2.236.

This is how to compute the euclidean distance using the method `euclidean()` of Python Scipy.

## Python Scipy Distance Matrix Cityblock

The Python Scipy module `scipy.spatial.distance` contains a method `cityblock()` that calculates the Manhattan distance and also known as City Block distance.

The syntax is given below.

``scipy.spatial.distance.cityblock(u, v, w=None)``

Where parameters are:

• u(array_data): Input matrix or array.
• v(array_data): Input matrix or array.
• w(array_data): Each value’s weights in u and v. When None is selected as the default, each value weights 1.

The method `cityblokc()` returns cityblock(which is the Manhattan (cityblock) distance of the two vectors, u and v) of type double.

Let’s do some examples by following the below steps:

Import the method `cityblok()` and compute Manhattan the distance using the below python code.

``````from scipy.spatial.distance import cityblock
cityblock([1, 0, 2], [2, 1, 2])``````

Look at the above output, the cityblock (Manhattan) distance of the given array is 2.

This is how to compute the cityblock distance using the method `cityblock()` of Python Scipy.

## Python Scipy Distance Matrix Clustering

The Python Scipy module `scipy.spatial.hierarchy` contains a method `linkage()` that cluster data using hierarchical or agglomerative methods. But “What is hierarchical clustering?”

• An algorithm called hierarchical clustering, commonly referred to as hierarchical cluster analysis, divides objects into clusters based on how similar they are. The result is a collection of clusters, each of which differs from the others while having things that are generally similar to one another.

The syntax is given below.

``scipy.cluster.hierarchy.linkage(y, method='complete', optimal_ordering=False, metric='euclidean')``

Where parameters are:

• y(ndarray): a compressed distance matrix. The upper triangle of the distance matrix is included in a flat array known as a condensed distance matrix. Pdist returns this form as a result. An m by n array can also be used to pass a set of m observation vectors in n dimensions. There cannot be any NaNs or infs in the condensed distance matrix, all elements must be finite.
• method(str, function): The appropriate linkage algorithm to use.
• optimal_ordering(boolean): If True, the connection matrix will be rearranged to have the shortest possible distance between succeeding leaves. When the data are visualized, this leads to a more understandable tree structure. defaults to False due to the algorithm’s potential for slow performance, especially with large datasets.
• metric(str): If y is a collection of observation vectors, then use this distance metric; otherwise, disregard it. For a list of acceptable distance measurements, refer to the pdist function. Additionally, a custom distance function is available.

The method `linkage()` returns `z`(The linking matrix is used to encode the hierarchical grouping) of type ndarray.

Import the required libraries using the below python code.

``````from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt``````

Create data on which we want to perform the clustering using the below code.

``data = [[i] for i in [1, 7, 1, 5, 2, 8, 8, 1]]``

Pass the above-created data to a method `linkage()` with ward to compute the clustering using the below code.

``res = linkage(data, 'ward')``

Plot the dendrogram of the above result using the below code.

``````fig = plt.figure(figsize=(5, 5))
dendo = dendrogram(res)``````

In the above, code we have passed the result from the method `linkage(data,'ward')` to a method `dendrogram(res)` to show the dendrogram of the get the clusters.

This is how to calculate the distance matrix clustering using the method `linkage()` of Python Scipy.

## Python Scipy Distance Matrix Directed Hausdorff

The Python Scipy module `scipy.spatial.distance` contains a method `directed_hausdorff()` that finds the directed hausdorff distance between two two-dimensional arrays.

The syntax is given below.

``scipy.spatial.distance.directed_hausdorff(u, v, seed=2)``

Where parameters are:

• u(array_data, (M,N)): Input matrix or array.
• v(array_data, (ON)): Input matrix or array.
• seed(int): The default value of 0 ensures reproducibility by randomly rearranging the values of u and v.

The method `directed_hausdorff()` returns `d`(The arrays u and v’s directed Hausdorff distance), `index_1`(index of the point in u that contributes to the Hausdorff pair), and `index_2`(index of the point in v that contributes to the Hausdorff pair) of type double, int, and int respectively.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

``````from scipy.spatial import distance
import numpy as np``````

Create two 2-dimensional arrays using the below code.

``````u_arr = np.array([(2.0, 0.0),
(0.0, 2.0),
(-2.0, 0.0),
(0.0, -2.0)])
v_arr = np.array([(1.0, 0.0),
(0.0, 1.0),
(-1.0, 0.0),
(0.0, -2.0)])``````

Compute the directed Hausdorff distance of the above-created array using the below code.

``distance.directed_hausdorff(u_arr, v_arr)[0]``

Look at the above output, the directed Hausdorff distance between given arrays is 1.

This is how to compute the directed Hausdorff distance using the method `directed_hausdorff()` of Python Scipy.

## Python Scipy Distance Matrix Cosine

The Python Scipy module `scipy.spatial.distance` contains a method `cosine()` that identify the cosine distance of one-dimensional arrays.

Assume that u and v have a cosine distance of :

The syntax is given below.

``scipy.spatial.distance.cosine(u, v, w=None)``

Where parameters are:

• u(array_data): Input matrix or array.
• v(array_data): Input matrix or array.
• w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.

The method `cosine()` returns `cosine`(which is u and v’s cosine distance from one another) of type double.

Let’s take an example by following the below steps:

Import the method `cosine()` and compute the distance using the below python code.

``````from scipy.spatial.distance import cosine
cosine([0, 1, 1], [1, 0, 1])``````

Look at the above output, the cosine distance of the given array is 0.5.

This is how to compute the cosine distance using the method `cosine()` of Python Scipy.

## Python Scipy Distance Correlation Matrix

The Python Scipy has a method `correlation()` in a module `scipy.spatial.distance` that identify the correlation distance in the middle of two one-dimensional arrays.

Assume that u and v have a correlation distance of :

The syntax is given below.

``scipy.spatial.distance.correlation(u, v, w, centered = False)``
• u(array_data): Input matrix or array.
• v(array_data): Input matrix or array.
• w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.
• centered(boolean): If accurate, u and v will be in the middle. True by default.

The method `correlation()` returns `correlation`(which is the separation of one-dimensional arrays u and v in terms of correlation) of type double.

Let’s take an example by following the below steps:

Import the method `correlation()` and compute the distance using the below python code.

``````from scipy.spatial.distance import correlation
correlation([0, 1, 1], [1, 0, 1])``````

Look at the above output, the correlation distance of the given array is 1.5.

This is how to compute the correlation distance using the method `correlation()` of Python Scipy.

You may also like to read the following Python Scipy tutorials.

So in this tutorial, we have learned about the “Python Scipy Distance Matrix” and covered the following topics.

• Python Scipy Distance Matrix
• Python Scipy Distance Matrix Cdist
• Python Scipy Distance Matrix Pdist
• Python Scipy Distance Matrix Euclidean
• Python Scipy Distance Matrix Cityblock
• Python Scipy Distance Matrix Clustering
• Python Scipy Distance Matrix Directed Hausdorff
• Python Scipy Distance Matrix Cosine
• Python Scipy Distance Correlation Matrix