In this Python tutorial, we will learn about the “**Python Scipy Distance Matrix**” where we will calculate the distance between matrices or arrays using the different distance methods like euclidean, manhattan, etc, and cover the following topics.

- Python Scipy Distance Matrix
- Python Scipy Distance Matrix Cdist
- Python Scipy Distance Matrix Pdist
- Python Scipy Distance Matrix Euclidean
- Python Scipy Distance Matrix Cityblock
- Python Scipy Distance Matrix Clustering
- Python Scipy Distance Matrix Directed Hausdorff
- Python Scipy Distance Matrix Cosine
- Python Scipy Distance Correlation Matrix

Also, check the latest tutorial on Python SciPy: Python Scipy Stats Kurtosis

## Python Scipy Distance Matrix

The distances between the vectors of matrix/matrices that were calculated pairwise are contained in a distance matrix. We may compute the distance matrix using the

method provided by the *distance_matrix()*

module. In most cases, matrices have the shape of a 2-D array, with matrix rows serving as the matrix’s vectors ( one-dimensional array).*scipy.spatial*

The syntax is given below.

`scipy.spatial.distance_matrix(x, y, threshold=1000000, p=2)`

Where parameters are:

**x(array_data(m,k):**K-dimensional matrix with M vectors.**y(array_data(n,k)):**N by K-dimensional matrix of vectors.**threshold(positive int):**The algorithm uses a Python loop rather than large temporary arrays if M * N * K is greater than the threshold.**p(float):**which Minkowski p-norm should be applied.

The method

returns a matrix that measures the separation between each vector in x and each vector in y of type ndarray.*distance_matrix()*

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

```
from scipy import spatial
import numpy as np
```

Create matrices using the below code.

```
x_mat = np.array([[2,3],[1,2],[3,3]])
y_mat = np.array([[1,2],[2,0],[5,0]])
```

Calculate the distance matrix using the below code.

`d_matrix = spatial.distance_matrix(x_mat,y_mat,p=2)`

View the distance matrix using the below code.

`print(d_matrix)`

This is how to compute the distance matrix using the method

of module Python *distance_matrix()*

.**scipy.spatial**

Also read: Python Scipy Exponential

## Python Scipy Distance Matrix Cdist

The Python Scipy contains a method

in a module *cdist()*

that calculates the distance between each pair of the two input collections.*scipy.spatial.distance*

The syntax is given below.

`scipy.spatial.distance.cdist(XA, XB, metric='cosine')`

Where parameters are:

**XA(array_data):**An array of original m_{B }observations in n dimensions, each measuring m_{B}by n.**XB(array_data):**an array of original m_{B}observations in n dimensions, each measuring m_{B}by n.**metric(callabel, str):**The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method** ** returns the result that is a distance matrix of size m

`cdist()`

_{A}by m

_{B}.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

`from scipy.spatial.distance import cdist`

Create data using the below code.

```
coord_data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]
```

Between four two-dimension coordinates, calculate their Euclidean distances.

`cdist(coord_data,coord_data,'euclidean')`

This is how to use the method

of Python Scipy to calculate the distance between each pair of the two input collections.*cdist()*

Read: Python Scipy Chi-Square Test

## Python Scipy Distance Matrix Pdist

The Python Scipy contains a method

in a module *pdist()*

that calculates the pairwise distances in n-dimensional space between observations.*scipy.spatial.distance*

The syntax is given below.

`scipy.spatial.distance.pdist(X, metric='minkowski)`

Where parameters are:

**X(array_data):**An array of m unique observations in n dimensions, arranged m by n.**metric(callabel, str):**The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method `pdist()`

returns of type ndarray.

`Y`

(which is a compressed distance matrix Y)Let’s understand with illustrations by following the below steps:

Import the required libraries using the below python code.

`from scipy.spatial.distance import pdist`

Create data using the below code.

```
data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]
```

Use the Euclidean distance (2-norm) as the distance metric between the points to calculate the distance between m points. In matrix data, the points are arranged as m n-dimensional row vectors.

`pdist(data,'euclidean')`

This how-to computes the pairwise distances in n-dimensional space between observations using the

of Python Scipy*pdist()*

Read: Scipy Find Peaks – Useful Tutorial

## Python Scipy Distance Matrix Euclidean

The Python Scipy contains a method

in a module *euclidean()*

that calculates the Euclidean distance between 2 one-dimensional arrays.*scipy.spatial.distance*

The syntax is given below.

`scipy.spatial.distance.euclidean(u, v, w=None)`

Where parameters are:

**u(array_data):**Input matrix or array.**v(array_data):**Input matrix or array.**w(array_data):**Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.

The method

returns *euclidean()* of type double.

`euclidean`

(which is the euclidean distance of the two vectors, u and v)Let’s do some examples by following the below steps:

Import the method

and compute the distance using the below python code.*euclidean()*

```
from scipy.spatial.distance import euclidean
euclidean([2, 1, 0], [1, 1, 2])
```

Look at the above output, the euclidean distance of the given array is 2.236.

This is how to compute the euclidean distance using the method

of Python Scipy.*euclidean()*

Read: Python Scipy Special

## Python Scipy Distance Matrix Cityblock

The Python Scipy module

contains a method *scipy.spatial.distance*

that calculates the Manhattan distance and also known as City Block distance.*cityblock()*

The syntax is given below.

`scipy.spatial.distance.cityblock(u, v, w=None)`

Where parameters are:

**u(array_data):**Input matrix or array.**v(array_data):**Input matrix or array.**w(array_data):**Each value’s weights in u and v. When None is selected as the default, each value weights 1.

The method

returns *cityblokc()*** cityblock(which is the Manhattan (cityblock) distance of the two vectors, u and v)** of type double.

Let’s do some examples by following the below steps:

Import the method

and compute Manhattan the distance using the below python code.*cityblok()*

```
from scipy.spatial.distance import cityblock
cityblock([1, 0, 2], [2, 1, 2])
```

Look at the above output, the cityblock (Manhattan) distance of the given array is 2.

This is how to compute the cityblock distance using the method

of Python Scipy.*cityblock()*

Read: Python Scipy Matrix + Examples

## Python Scipy Distance Matrix Clustering

The Python Scipy module

contains a method *scipy.spatial.hierarchy*

that cluster data using hierarchical or agglomerative methods. But *linkage()**“What is hierarchical clustering?”*

- An algorithm called hierarchical clustering, commonly referred to as hierarchical cluster analysis, divides objects into clusters based on how similar they are. The result is a collection of clusters, each of which differs from the others while having things that are generally similar to one another.

The syntax is given below.

`scipy.cluster.hierarchy.linkage(y, method='complete', optimal_ordering=False, metric='euclidean')`

Where parameters are:

**y(ndarray):**a compressed distance matrix. The upper triangle of the distance matrix is included in a flat array known as a condensed distance matrix. Pdist returns this form as a result. An m by n array can also be used to pass a set of m observation vectors in n dimensions. There cannot be any NaNs or infs in the condensed distance matrix, all elements must be finite.**method(str, function):**The appropriate linkage algorithm to use.**optimal_ordering(boolean):**If True, the connection matrix will be rearranged to have the shortest possible distance between succeeding leaves. When the data are visualized, this leads to a more understandable tree structure. defaults to False due to the algorithm’s potential for slow performance, especially with large datasets.**metric(str):**If y is a collection of observation vectors, then use this distance metric; otherwise, disregard it. For a list of acceptable distance measurements, refer to the pdist function. Additionally, a custom distance function is available.

The method** ** returns

`linkage()`

**of type ndarray.**

`z`

(The linking matrix is used to encode the hierarchical grouping)Import the required libraries using the below python code.

```
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
```

Create data on which we want to perform the clustering using the below code.

`data = [[i] for i in [1, 7, 1, 5, 2, 8, 8, 1]]`

Pass the above-created data to a method

with ward to compute the clustering using the below code.*linkage()*

`res = linkage(data, 'ward')`

Plot the dendrogram of the above result using the below code.

```
fig = plt.figure(figsize=(5, 5))
dendo = dendrogram(res)
```

In the above, code we have passed the result from the method

to a method *linkage(data,'ward')*

to show the dendrogram of the get the clusters.*dendrogram(res)*

This is how to calculate the distance matrix clustering using the method

of Python Scipy.*linkage()*

Read: Scipy Stats Zscore + Examples

## Python Scipy Distance Matrix Directed Hausdorff

The Python Scipy module

contains a method *scipy.spatial.distance*

that finds the directed hausdorff distance between two two-dimensional arrays.*directed_hausdorff()*

The syntax is given below.

`scipy.spatial.distance.directed_hausdorff(u, v, seed=2)`

Where parameters are:

**u(array_data, (M,N)):**Input matrix or array.**v(array_data, (ON)):**Input matrix or array.**seed(int):**The default value of 0 ensures reproducibility by randomly rearranging the values of u and v.

The method

returns *directed_hausdorff()*,

`d`

(The arrays u and v’s directed Hausdorff distance)**, and**

`index_1`

(index of the point in u that contributes to the Hausdorff pair)**of type double, int, and int respectively.**

`index_2`

(index of the point in v that contributes to the Hausdorff pair)Let’s take an example by following the below steps:

Import the required libraries using the below python code.

```
from scipy.spatial import distance
import numpy as np
```

Create two 2-dimensional arrays using the below code.

```
u_arr = np.array([(2.0, 0.0),
(0.0, 2.0),
(-2.0, 0.0),
(0.0, -2.0)])
v_arr = np.array([(1.0, 0.0),
(0.0, 1.0),
(-1.0, 0.0),
(0.0, -2.0)])
```

Compute the directed Hausdorff distance of the above-created array using the below code.

`distance.directed_hausdorff(u_arr, v_arr)[0]`

Look at the above output, the directed Hausdorff distance between given arrays is 1.

This is how to compute the directed Hausdorff distance using the method

of Python Scipy.*directed_hausdorff()*

Read: Scipy Convolve – Complete Guide

## Python Scipy Distance Matrix Cosine

The Python Scipy module

contains a method *scipy.spatial.distance*

that identify the cosine distance of one-dimensional arrays.*cosine()*

Assume that u and v have a cosine distance of :

The syntax is given below.

`scipy.spatial.distance.cosine(u, v, w=None)`

Where parameters are:

**u(array_data):**Input matrix or array.**v(array_data):**Input matrix or array.**w(array_data):**Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.

The method

returns *cosine()* of type double.

`cosine`

(which is u and v’s cosine distance from one another)Let’s take an example by following the below steps:

Import the method

and compute the distance using the below python code.*cosine()*

```
from scipy.spatial.distance import cosine
cosine([0, 1, 1], [1, 0, 1])
```

Look at the above output, the cosine distance of the given array is 0.5.

This is how to compute the cosine distance using the method

of Python Scipy.*cosine()*

Read: Scipy Integrate + Examples

## Python Scipy Distance Correlation Matrix

The Python Scipy has a method

in a module *correlation()*

that identify the correlation distance in the middle of two one-dimensional arrays.*scipy.spatial.distance*

Assume that u and v have a correlation distance of :

The syntax is given below.

`scipy.spatial.distance.correlation(u, v, w, centered = False)`

**u(array_data):**Input matrix or array.**v(array_data):**Input matrix or array.**w(array_data):**Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.**centered(boolean):**If accurate, u and v will be in the middle. True by default.

The method

returns *correlation()*

(which is the separation of one-dimensional arrays u and v in terms of correlation) of type double.*correlation*

Let’s take an example by following the below steps:

Import the method

and compute the distance using the below python code.*correlation()*

```
from scipy.spatial.distance import correlation
correlation([0, 1, 1], [1, 0, 1])
```

Look at the above output, the correlation distance of the given array is 1.5.

This is how to compute the correlation distance using the method

of Python Scipy.*correlation()*

You may also like to read the following Python Scipy tutorials.

- Scipy Stats – Complete Guide
- Scipy Sparse – Helpful Tutorial
- Scipy Optimize – Helpful Guide
- Scipy Signal – Helpful Tutorial

So in this tutorial, we have learned about the “Python Scipy Distance Matrix” and covered the following topics.

- Python Scipy Distance Matrix
- Python Scipy Distance Matrix Cdist
- Python Scipy Distance Matrix Pdist
- Python Scipy Distance Matrix Euclidean
- Python Scipy Distance Matrix Cityblock
- Python Scipy Distance Matrix Clustering
- Python Scipy Distance Matrix Directed Hausdorff
- Python Scipy Distance Matrix Cosine
- Python Scipy Distance Correlation Matrix

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.