In this Python Scipy tutorial, we will learn about the “Python Scipy Pairwise Distance“. Using various distance matrics, like Canberra, Jaccard, Euclidean, and others, we will compute the pairwise distance between points or arrays.
- What is Pairwise Distance?
- Python Scipy Pairwise Distance Matrix
- Python Scipy Pairwise Distance Jaccard
- Python Scipy Pairwise Distance Euclidean
- Python Scipy Pairwise Distance Hamming
- Python Scipy Pairwise Distance Manhattan
- Python Scipy Pairwise Distance Minkowski
- Python Scipy Pairwise Distance Canberra
- Python Scipy Pairwise Distance Chebyshev
- Python Scipy Pairwise Distance Jensenshannon
What is Pairwise Distance?
Finding a tree that best predicts the observed collection of distances, given a measure of the distance between each pair of species, would be a straightforward solution to the phylogeny problem.
This reduces the data matrix M to a straightforward table of pairwise distances by omitting some of the data. However, it appears that a lot of the evolutionary information is often transmitted over these distances.
In this tutorial, we will learn how to compute the pairwise distance with the help of Python Scipy methods.
Python Scipy Pairwise Distance Matrix
The scipy.spatial.distance
the module of the Python library Scipy offers a function called pdist()
that computes the pairwise distances in n-dimensional space between observations.
The syntax is given below.
scipy.spatial.distance.pdist(X, metric='minkowski')
Where parameters are:
- X(array_data): A collection of m different observations, each in n dimensions, ordered m by n.
- metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.
The method pdist()
returns the Y
condensed distance matrix.
Let’s understand with an example by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create data using the below code.
data = [(25.056, -75.7226),
(25.7411, -79.1197),
(25.2897, -79.2294),
(25.6716, -79.3378)]
Use the correlation as the distance metric between the points to calculate the distance between m and points.
pdist(data,'correlation')
This is how to compute the pairwise distance matrix using the method pdist()
of Python Scipy.
Python Scipy Pairwise Distance Jaccard
For clustering and multidimensional scaling of n sample sets, the Jaccard distance is frequently employed to compute an n*n matrix. This distance serves as a measure for the collection of all finite sets.
So here in this section, we will use the metric jaccard
to compute the distance. Let’s check with an example by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create data using the below code.
samp_data = [(27.056, -65.3226),
(27.6411, -65.2197),
(27.6897, -65.6294),
(27.5716, -65.1378)]
Use the Jaccard as the distance metric between the points to calculate the distance between m and points.
pdist(samp_data,'jaccard')
This is how to compute the pairwise Jaccard distance matrix using the method pdist()
with metric jaccard
of Python Scipy.
Python Scipy Pairwise Distance Euclidean
The shortest distance between two points is known as the “Euclidean Distance.” This distance metric is used by the majority of machine learning algorithms, such as K-Means, to gauge how similar two observations are.
The Python Scipy method pdist()
accepts the metric euclidean
for computing this kind of distance. So here we will compute the pairwise distance using the Euclidean metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(5, 8),
(10, 12),
(11, 15),
(19, 16)]
Use Euclidean as the distance metric between the points to calculate the distance between m and n points.
pdist(samp_data,'euclidean')
This is how to compute the pairwise Euclidean distance matrix using the method pdist()
with metric euclidean
of Python Scipy.
Read Python Scipy Ndimage Imread Tutorial
Python Scipy Pairwise Distance Manhattan
The total absolute differences between two points in all dimensions constitute the Manhattan Distance. The Python Scipy method pdist()
accepts the metric cityblock
for computing this kind of distance.
Let’s compute the pairwise distance using the Manhattan (also known as city-block in Python Scipy) metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(5, 8),
(10, 12),
(11, 15),
(19, 16)]
Use the cityblock as the distance metric between the points to calculate the distance between m and n points.
pdist(samp_data,'cityblock')
This is how to compute the pairwise Manhattan distance matrix using the method pdist()
with metric cityblock
of Python Scipy.
Python Scipy Pairwise Distance Minkowski
A distance in N-dimensional space called the Minkowski distance is calculated between two points. In essence, it is a generalization of both the Manhattan distance and the Euclidean distance.
It is frequently employed in machine learning, particularly in the idea of determining the best correlation or classification of data.
The Python Scipy method pdist()
accepts the metric minkowski
for computing this kind of distance. Let’s compute the pairwise distance using the Minkowski metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(10, 8),
(10, 12),
(10, 15),
(19, 16)]
To determine the distance between points m and n, use the Minkowski as the distance metric.
pdist(samp_data,'minkowski')
This is how to compute the pairwise Minkowski distance matrix using the method pdist()
with metric minkowski
of Python Scipy.
Read Python Scipy Softmax
Python Scipy Pairwise Distance Hamming
The amount of bits that differ in both numbers at the same point is known as the Hamming Distance between two integers. The Python Scipy method pdist()
accepts the metric hamming
for computing this kind of distance.
Let’s take an example and compute the pairwise distance using the Hamming metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(10, 8),
(10, 12),
(10, 15),
(19, 16)]
To determine the distance between points m and n, use hamming as the distance metric.
pdist(samp_data,'hamming')
This is how to compute the pairwise Hamming distance matrix using the method pdist()
with metric hamming
of Python Scipy.
Python Scipy Pairwise Distance Canberra
Godfrey N. Lance and William T. Williams first proposed the Canberra distance in 1966, and it was later improved. The Canberra distance is a numerical representation of the difference between two points in a vector space. It is the L1 (Manhattan) distance with weights added.
Let’s take an example and compute the pairwise distance using the Canberra metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(9, 8),
(7, 12),
(3, 15),
(12, 16)]
To determine the numerical representation of the difference between two points in a vector space, use Canberra as the distance metric.
pdist(samp_data,'canberra')
This is how to compute the pairwise Canberra distance matrix using the method pdist()
with metric canberra
of Python Scipy.
Read How to use Python Scipy Differential Evolution
Python Scipy Pairwise Distance Chebyshev
The Chebyshev distance formula also referred to as the “maximum metric” in mathematics, calculates the distance between two points as the largest difference over all of their axis values.
The Python Scipy method pdist()
accepts the metric chebyshev
for computing this kind of pairwise distance.
Let’s take an example and compute the pairwise distance using the Chebyshev metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(5, 9),
(12, 7),
(15, 3),
(16, 12)]
To determine the largest difference between two points, use Chebyshev as the distance metric.
pdist(samp_data,'chebyshev')
This is how to compute the pairwise Chebyshev distance matrix using the method pdist()
with metric chebyshev
of Python Scipy.
Read: Python Scipy Spatial Distance Cdist
Python Scipy Pairwise Distance Jensenshannon
The difference between the two probabilities is measured by the Jensen-Shannon distance. The Python Scipy method pdist()
accepts the metric jensenshannon
for computing this kind of pairwise distance.
Let’s take an example and compute the pairwise distance using the Jensenshannon metric by following the below steps:
Import the required libraries using the below python code.
from scipy.spatial.distance import pdist
Create sample data using the below code.
samp_data = [(5, 9),
(12, 7),
(15, 3),
(16, 12)]
To determine the difference between two probabilities, use Jensenshannon as the distance metric.
pdist(samp_data,'jensenshannon')
This is how to compute the pairwise Jensenshannon distance matrix using the method pdist()
with metric jensenshannon
of Python Scipy.
Conclusion
Consequently, we have learned in this tutorial how to construct the pairwise distance matrix in Scipy using several distance metrics, including Hamming, Euclidean, Jensen Shannon, and others. included the following topics.
- What is Pairwise Distance?
- Python Scipy Pairwise Distance Matrix
- Python Scipy Pairwise Distance Jaccard
- Python Scipy Pairwise Distance Euclidean
- Python Scipy Pairwise Distance Hamming
- Python Scipy Pairwise Distance Manhattan
- Python Scipy Pairwise Distance Minkowski
- Python Scipy Pairwise Distance Canberra
- Python Scipy Pairwise Distance Chebyshev
- Python Scipy Pairwise Distance Jensenshannon
You may also like the following Python Scipy tutorials:
- How to use Python Scipy Linprog
- Python Lil_Matrix Scipy
- How to use Python Scipy Gaussian_Kde
- Python Scipy Sparse Csr_matrix
- Python Scipy Lognormal
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.