In this Python tutorial, we will learn about the “Python Scipy Distance Matrix” where we will calculate the distance between matrices or arrays using the different distance methods like euclidean, manhattan, etc, and cover the following topics.
- Python Scipy Distance Matrix
- Python Scipy Distance Matrix Cdist
- Python Scipy Distance Matrix Pdist
- Python Scipy Distance Matrix Euclidean
- Python Scipy Distance Matrix Cityblock
- Python Scipy Distance Matrix Clustering
- Python Scipy Distance Matrix Directed Hausdorff
- Python Scipy Distance Matrix Cosine
- Python Scipy Distance Correlation Matrix
Also, check the latest tutorial on Python SciPy: Python Scipy Stats Kurtosis
Python Scipy Distance Matrix
The distances between the vectors of matrix/matrices that were calculated pairwise are contained in a distance matrix. We may compute the distance matrix using the distance_matrix()
method provided by the scipy.spatial
module. In most cases, matrices have the shape of a 2-D array, with matrix rows serving as the matrix’s vectors ( one-dimensional array).
The syntax is given below.
scipy.spatial.distance_matrix(x, y, threshold=1000000, p=2)
Where parameters are:
- x(array_data(m,k): K-dimensional matrix with M vectors.
- y(array_data(n,k)): N by K-dimensional matrix of vectors.
- threshold(positive int): The algorithm uses a Python loop rather than large temporary arrays if M * N * K is greater than the threshold.
- p(float): which Minkowski p-norm should be applied.
The method distance_matrix()
returns a matrix that measures the separation between each vector in x and each vector in y of type ndarray.
Let’s take an example by following the below steps:
Import the required libraries using the below python code.
from scipy import spatial
import numpy as np
Create matrices using the below code.
x_mat = np.array([[2,3],[1,2],[3,3]])
y_mat = np.array([[1,2],[2,0],[5,0]])
Calculate the distance matrix using the below code.
d_matrix = spatial.distance_matrix(x_mat,y_mat,p=2)
View the distance matrix using the below code.
print(d_matrix)
This is how to compute the distance matrix using the method distance_matrix()
of module Python scipy.spatial
.
Also read: Python Scipy Exponential
Python Scipy Distance Matrix Cdist
The Python Scipy contains a method cdist()
in a module scipy.spatial.distance
that calculates the distance between each pair of the two input collections.
The syntax is given below.
scipy.spatial.distance.cdist(XA, XB, metric='cosine')
Where parameters are:
- XA(array_data): An array of original mB observations in n dimensions, each measuring mB by n.
- XB(array_data): an array of original mB observations in n dimensions, each measuring mB by n.
- metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.
The method cdist()
returns the result that is a distance matrix of size mA by mB.
Let’s take an example by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import cdist
Create data using the below code.
coord_data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]
Between four two-dimension coordinates, calculate their Euclidean distances.
cdist(coord_data,coord_data,'euclidean')
This is how to use the method cdist()
of Python Scipy to calculate the distance between each pair of the two input collections.
Read: Python Scipy Chi-Square Test
Python Scipy Distance Matrix Pdist
The Python Scipy contains a method pdist()
in a module scipy.spatial.distance
that calculates the pairwise distances in n-dimensional space between observations.
The syntax is given below.
scipy.spatial.distance.pdist(X, metric='minkowski)
Where parameters are:
- X(array_data): An array of m unique observations in n dimensions, arranged m by n.
- metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.
The method pdist()
returns Y
(which is a compressed distance matrix Y) of type ndarray.
Let’s understand with illustrations by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create data using the below code.
data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]
Use the Euclidean distance (2-norm) as the distance metric between the points to calculate the distance between m points. In matrix data, the points are arranged as m n-dimensional row vectors.
pdist(data,'euclidean')
This how-to computes the pairwise distances in n-dimensional space between observations using the pdist()
of Python Scipy
Read: Scipy Find Peaks – Useful Tutorial
Python Scipy Distance Matrix Euclidean
The Python Scipy contains a method euclidean()
in a module scipy.spatial.distance
that calculates the Euclidean distance between 2 one-dimensional arrays.
The syntax is given below.
scipy.spatial.distance.euclidean(u, v, w=None)
Where parameters are:
- u(array_data): Input matrix or array.
- v(array_data): Input matrix or array.
- w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.
The method euclidean()
returns euclidean
(which is the euclidean distance of the two vectors, u and v) of type double.
Let’s do some examples by following the below steps:
Import the method euclidean()
and compute the distance using the below python code.
from scipy.spatial.distance import euclidean
euclidean([2, 1, 0], [1, 1, 2])
Look at the above output, the euclidean distance of the given array is 2.236.
This is how to compute the euclidean distance using the method euclidean()
of Python Scipy.
Read: Python Scipy Special
Python Scipy Distance Matrix Cityblock
The Python Scipy module scipy.spatial.distance
contains a method cityblock()
that calculates the Manhattan distance and also known as City Block distance.
The syntax is given below.
scipy.spatial.distance.cityblock(u, v, w=None)
Where parameters are:
- u(array_data): Input matrix or array.
- v(array_data): Input matrix or array.
- w(array_data): Each value’s weights in u and v. When None is selected as the default, each value weights 1.
The method cityblokc()
returns cityblock(which is the Manhattan (cityblock) distance of the two vectors, u and v) of type double.
Let’s do some examples by following the below steps:
Import the method cityblok()
and compute Manhattan the distance using the below python code.
from scipy.spatial.distance import cityblock
cityblock([1, 0, 2], [2, 1, 2])
Look at the above output, the cityblock (Manhattan) distance of the given array is 2.
This is how to compute the cityblock distance using the method cityblock()
of Python Scipy.
Read: Python Scipy Matrix + Examples
Python Scipy Distance Matrix Clustering
The Python Scipy module scipy.spatial.hierarchy
contains a method linkage()
that cluster data using hierarchical or agglomerative methods. But “What is hierarchical clustering?”
- An algorithm called hierarchical clustering, commonly referred to as hierarchical cluster analysis, divides objects into clusters based on how similar they are. The result is a collection of clusters, each of which differs from the others while having things that are generally similar to one another.
The syntax is given below.
scipy.cluster.hierarchy.linkage(y, method='complete', optimal_ordering=False, metric='euclidean')
Where parameters are:
- y(ndarray): a compressed distance matrix. The upper triangle of the distance matrix is included in a flat array known as a condensed distance matrix. Pdist returns this form as a result. An m by n array can also be used to pass a set of m observation vectors in n dimensions. There cannot be any NaNs or infs in the condensed distance matrix, all elements must be finite.
- method(str, function): The appropriate linkage algorithm to use.
- optimal_ordering(boolean): If True, the connection matrix will be rearranged to have the shortest possible distance between succeeding leaves. When the data are visualized, this leads to a more understandable tree structure. defaults to False due to the algorithm’s potential for slow performance, especially with large datasets.
- metric(str): If y is a collection of observation vectors, then use this distance metric; otherwise, disregard it. For a list of acceptable distance measurements, refer to the pdist function. Additionally, a custom distance function is available.
The method linkage()
returns z
(The linking matrix is used to encode the hierarchical grouping) of type ndarray.
Import the required libraries using the below python code.
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
Create data on which we want to perform the clustering using the below code.
data = [[i] for i in [1, 7, 1, 5, 2, 8, 8, 1]]
Pass the above-created data to a method linkage()
with ward to compute the clustering using the below code.
res = linkage(data, 'ward')
Plot the dendrogram of the above result using the below code.
fig = plt.figure(figsize=(5, 5))
dendo = dendrogram(res)
In the above, code we have passed the result from the method linkage(data,'ward')
to a method dendrogram(res)
to show the dendrogram of the get the clusters.
This is how to calculate the distance matrix clustering using the method linkage()
of Python Scipy.
Read: Scipy Stats Zscore + Examples
Python Scipy Distance Matrix Directed Hausdorff
The Python Scipy module scipy.spatial.distance
contains a method directed_hausdorff()
that finds the directed hausdorff distance between two two-dimensional arrays.
The syntax is given below.
scipy.spatial.distance.directed_hausdorff(u, v, seed=2)
Where parameters are:
- u(array_data, (M,N)): Input matrix or array.
- v(array_data, (ON)): Input matrix or array.
- seed(int): The default value of 0 ensures reproducibility by randomly rearranging the values of u and v.
The method directed_hausdorff()
returns d
(The arrays u and v’s directed Hausdorff distance), index_1
(index of the point in u that contributes to the Hausdorff pair), and index_2
(index of the point in v that contributes to the Hausdorff pair) of type double, int, and int respectively.
Let’s take an example by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial import distance
import numpy as np
Create two 2-dimensional arrays using the below code.
u_arr = np.array([(2.0, 0.0),
(0.0, 2.0),
(-2.0, 0.0),
(0.0, -2.0)])
v_arr = np.array([(1.0, 0.0),
(0.0, 1.0),
(-1.0, 0.0),
(0.0, -2.0)])
Compute the directed Hausdorff distance of the above-created array using the below code.
distance.directed_hausdorff(u_arr, v_arr)[0]
Look at the above output, the directed Hausdorff distance between given arrays is 1.
This is how to compute the directed Hausdorff distance using the method directed_hausdorff()
of Python Scipy.
Read: Scipy Convolve – Complete Guide
Python Scipy Distance Matrix Cosine
The Python Scipy module scipy.spatial.distance
contains a method cosine()
that identify the cosine distance of one-dimensional arrays.
Assume that u and v have a cosine distance of :
The syntax is given below.
scipy.spatial.distance.cosine(u, v, w=None)
Where parameters are:
- u(array_data): Input matrix or array.
- v(array_data): Input matrix or array.
- w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.
The method cosine()
returns cosine
(which is u and v’s cosine distance from one another) of type double.
Let’s take an example by following the below steps:
Import the method cosine()
and compute the distance using the below python code.
from scipy.spatial.distance import cosine
cosine([0, 1, 1], [1, 0, 1])
Look at the above output, the cosine distance of the given array is 0.5.
This is how to compute the cosine distance using the method cosine()
of Python Scipy.
Read: Scipy Integrate + Examples
Python Scipy Distance Correlation Matrix
The Python Scipy has a method correlation()
in a module scipy.spatial.distance
that identify the correlation distance in the middle of two one-dimensional arrays.
Assume that u and v have a correlation distance of :
The syntax is given below.
scipy.spatial.distance.correlation(u, v, w, centered = False)
- u(array_data): Input matrix or array.
- v(array_data): Input matrix or array.
- w(array_data): Each value’s weights in u and v. When None is selected as the default, each value has a weight of 1.
- centered(boolean): If accurate, u and v will be in the middle. True by default.
The method correlation()
returns correlation
(which is the separation of one-dimensional arrays u and v in terms of correlation) of type double.
Let’s take an example by following the below steps:
Import the method correlation()
and compute the distance using the below python code.
from scipy.spatial.distance import correlation
correlation([0, 1, 1], [1, 0, 1])
Look at the above output, the correlation distance of the given array is 1.5.
This is how to compute the correlation distance using the method correlation()
of Python Scipy.
You may also like to read the following Python Scipy tutorials.
- Scipy Stats – Complete Guide
- Scipy Sparse – Helpful Tutorial
- Scipy Optimize – Helpful Guide
- Scipy Signal – Helpful Tutorial
So in this tutorial, we have learned about the “Python Scipy Distance Matrix” and covered the following topics.
- Python Scipy Distance Matrix
- Python Scipy Distance Matrix Cdist
- Python Scipy Distance Matrix Pdist
- Python Scipy Distance Matrix Euclidean
- Python Scipy Distance Matrix Cityblock
- Python Scipy Distance Matrix Clustering
- Python Scipy Distance Matrix Directed Hausdorff
- Python Scipy Distance Matrix Cosine
- Python Scipy Distance Correlation Matrix
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.