Python Scipy Pairwise Distance [With 9 Examples]

In this Python Scipy tutorial, we will learn about the “Python Scipy Pairwise Distance“. Using various distance matrics, like Canberra, Jaccard, Euclidean, and others, we will compute the pairwise distance between points or arrays.

  • What is Pairwise Distance?
  • Python Scipy Pairwise Distance Matrix
  • Python Scipy Pairwise Distance Jaccard
  • Python Scipy Pairwise Distance Euclidean
  • Python Scipy Pairwise Distance Hamming
  • Python Scipy Pairwise Distance Manhattan
  • Python Scipy Pairwise Distance Minkowski
  • Python Scipy Pairwise Distance Canberra
  • Python Scipy Pairwise Distance Chebyshev
  • Python Scipy Pairwise Distance Jensenshannon

What is Pairwise Distance?

Finding a tree that best predicts the observed collection of distances, given a measure of the distance between each pair of species, would be a straightforward solution to the phylogeny problem.

This reduces the data matrix M to a straightforward table of pairwise distances by omitting some of the data. However, it appears that a lot of the evolutionary information is often transmitted over these distances.

In this tutorial, we will learn how to compute the pairwise distance with the help of Python Scipy methods.

Read Python Scipy Linalg Svd

Python Scipy Pairwise Distance Matrix

The scipy.spatial.distance the module of the Python library Scipy offers a function called pdist() that computes the pairwise distances in n-dimensional space between observations.

The syntax is given below.

scipy.spatial.distance.pdist(X, metric='minkowski')

Where parameters are:

  • X(array_data): A collection of m different observations, each in n dimensions, ordered m by n.
  • metric(callabel, str): The distance unit to be applied. The distance function can be “canberra,” “braycurtis,” “chebyshev,” “correlation,” “cityblock,” “cosine,” “euclidean,” “dice,” “hamming,” “kulsinski,” “jensenshannon,” “kulczynski1,” “matching,” “mahalanobis,” “minkowski,” “russellrao,” “rogerstanimoto,” “seuclidean”.

The method pdist() returns the Y condensed distance matrix.

Let’s understand with an example by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create data using the below code.

data = [(25.056, -75.7226),
          (25.7411, -79.1197),
          (25.2897, -79.2294),
          (25.6716, -79.3378)]

Use the correlation as the distance metric between the points to calculate the distance between m and points.

pdist(data,'correlation')
Python Scipy Pairwise Distance Matrix
Python Scipy Pairwise Distance Matrix

This is how to compute the pairwise distance matrix using the method pdist() of Python Scipy.

Read Python Scipy Smoothing

Python Scipy Pairwise Distance Jaccard

For clustering and multidimensional scaling of n sample sets, the Jaccard distance is frequently employed to compute an n*n matrix. This distance serves as a measure for the collection of all finite sets.

So here in this section, we will use the metric jaccard to compute the distance. Let’s check with an example by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create data using the below code.

samp_data = [(27.056, -65.3226),
          (27.6411, -65.2197),
          (27.6897, -65.6294),
          (27.5716, -65.1378)]

Use the Jaccard as the distance metric between the points to calculate the distance between m and points.

pdist(samp_data,'jaccard')
Python Scipy Pairwise Distance Jaccard
Python Scipy Pairwise Distance Jaccard

This is how to compute the pairwise Jaccard distance matrix using the method pdist() with metric jaccard of Python Scipy.

Python Scipy Pairwise Distance Euclidean

The shortest distance between two points is known as the “Euclidean Distance.” This distance metric is used by the majority of machine learning algorithms, such as K-Means, to gauge how similar two observations are.

The Python Scipy method pdist() accepts the metric euclidean for computing this kind of distance. So here we will compute the pairwise distance using the Euclidean metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(5, 8),
          (10, 12),
          (11, 15),
          (19, 16)]

Use Euclidean as the distance metric between the points to calculate the distance between m and n points.

pdist(samp_data,'euclidean')
Python Scipy Pairwise Distance Euclidean
Python Scipy Pairwise Distance Euclidean

This is how to compute the pairwise Euclidean distance matrix using the method pdist() with metric euclidean of Python Scipy.

Read Python Scipy Ndimage Imread Tutorial

Python Scipy Pairwise Distance Manhattan

The total absolute differences between two points in all dimensions constitute the Manhattan Distance. The Python Scipy method pdist() accepts the metric cityblock for computing this kind of distance.

Let’s compute the pairwise distance using the Manhattan (also known as city-block in Python Scipy) metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(5, 8),
          (10, 12),
          (11, 15),
          (19, 16)]

Use the cityblock as the distance metric between the points to calculate the distance between m and n points.

pdist(samp_data,'cityblock')
Python Scipy Pairwise Distance Manhattan
Python Scipy Pairwise Distance Manhattan

This is how to compute the pairwise Manhattan distance matrix using the method pdist() with metric cityblock of Python Scipy.

Python Scipy Pairwise Distance Minkowski

A distance in N-dimensional space called the Minkowski distance is calculated between two points. In essence, it is a generalization of both the Manhattan distance and the Euclidean distance.

It is frequently employed in machine learning, particularly in the idea of determining the best correlation or classification of data.

The Python Scipy method pdist() accepts the metric minkowski for computing this kind of distance. Let’s compute the pairwise distance using the Minkowski metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(10, 8),
          (10, 12),
          (10, 15),
          (19, 16)]

To determine the distance between points m and n, use the Minkowski as the distance metric.

pdist(samp_data,'minkowski')
Python Scipy Pairwise Distance Minkowski
Python Scipy Pairwise Distance Minkowski

This is how to compute the pairwise Minkowski distance matrix using the method pdist() with metric minkowski of Python Scipy.

Read Python Scipy Softmax

Python Scipy Pairwise Distance Hamming

The amount of bits that differ in both numbers at the same point is known as the Hamming Distance between two integers. The Python Scipy method pdist() accepts the metric hamming for computing this kind of distance.

Let’s take an example and compute the pairwise distance using the Hamming metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(10, 8),
          (10, 12),
          (10, 15),
          (19, 16)]

To determine the distance between points m and n, use hamming as the distance metric.

pdist(samp_data,'hamming')
Python Scipy Pairwise Distance Hamming
Python Scipy Pairwise Distance Hamming

This is how to compute the pairwise Hamming distance matrix using the method pdist() with metric hamming of Python Scipy.

Python Scipy Pairwise Distance Canberra

Godfrey N. Lance and William T. Williams first proposed the Canberra distance in 1966, and it was later improved. The Canberra distance is a numerical representation of the difference between two points in a vector space. It is the L1 (Manhattan) distance with weights added.

Let’s take an example and compute the pairwise distance using the Canberra metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(9, 8),
          (7, 12),
          (3, 15),
          (12, 16)]

To determine the numerical representation of the difference between two points in a vector space, use Canberra as the distance metric.

pdist(samp_data,'canberra')
Python Scipy Pairwise Distance Canberra
Python Scipy Pairwise Distance Canberra

This is how to compute the pairwise Canberra distance matrix using the method pdist() with metric canberra of Python Scipy.

Read How to use Python Scipy Differential Evolution

Python Scipy Pairwise Distance Chebyshev

The Chebyshev distance formula also referred to as the “maximum metric” in mathematics, calculates the distance between two points as the largest difference over all of their axis values.

The Python Scipy method pdist() accepts the metric chebyshev for computing this kind of pairwise distance.

Let’s take an example and compute the pairwise distance using the Chebyshev metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(5, 9),
          (12, 7),
          (15, 3),
          (16, 12)]

To determine the largest difference between two points, use Chebyshev as the distance metric.

pdist(samp_data,'chebyshev')
Python Scipy Pairwise Distance Chebyshev
Python Scipy Pairwise Distance Chebyshev

This is how to compute the pairwise Chebyshev distance matrix using the method pdist() with metric chebyshev of Python Scipy.

Read: Python Scipy Spatial Distance Cdist

Python Scipy Pairwise Distance Jensenshannon

The difference between the two probabilities is measured by the Jensen-Shannon distance. The Python Scipy method pdist() accepts the metric jensenshannon for computing this kind of pairwise distance.

Let’s take an example and compute the pairwise distance using the Jensenshannon metric by following the below steps:

Import the required libraries using the below python code.

from scipy.spatial.distance import pdist

Create sample data using the below code.

samp_data = [(5, 9),
          (12, 7),
          (15, 3),
          (16, 12)]

To determine the difference between two probabilities, use Jensenshannon as the distance metric.

pdist(samp_data,'jensenshannon')
Python Scipy Pairwise Distance Jensenshannon
Python Scipy Pairwise Distance Jensenshannon

This is how to compute the pairwise Jensenshannon distance matrix using the method pdist() with metric jensenshannon of Python Scipy.

Conclusion

Consequently, we have learned in this tutorial how to construct the pairwise distance matrix in Scipy using several distance metrics, including Hamming, Euclidean, Jensen Shannon, and others. included the following topics.

  • What is Pairwise Distance?
  • Python Scipy Pairwise Distance Matrix
  • Python Scipy Pairwise Distance Jaccard
  • Python Scipy Pairwise Distance Euclidean
  • Python Scipy Pairwise Distance Hamming
  • Python Scipy Pairwise Distance Manhattan
  • Python Scipy Pairwise Distance Minkowski
  • Python Scipy Pairwise Distance Canberra
  • Python Scipy Pairwise Distance Chebyshev
  • Python Scipy Pairwise Distance Jensenshannon

You may also like the following Python Scipy tutorials: