Python SciPy Eigenvalues

Recently, I was working on a data science project that required analyzing the principal components of a large dataset. The key to this analysis was computing eigenvalues efficiently. While NumPy offers eigenvalue computation, SciPy provides more specialized and often faster methods that can handle various matrix types.

In this article, I’ll walk you through multiple ways to compute eigenvalues using SciPy, with practical examples that demonstrate when to use each method.

So let’s dive in!

This Tutorial Covers:

Eigenvalues and Why They Are Important

Eigenvalues and their corresponding eigenvectors are fundamental concepts in linear algebra that have wide-ranging applications in machine learning, physics, engineering, and data analysis.

An eigenvalue represents how much a linear transformation stretches or compresses space in the direction of its associated eigenvector.

In practical terms, eigenvalues help us understand:

Principal directions in Principal Component Analysis (PCA)
Stability of systems in control theory
Vibration modes in structural engineering
Features in spectral clustering algorithms

Method 1: Use scipy.linalg.eig

The most straightforward way to calculate eigenvalues in SciPy is using the scipy.linalg.eig function. This method works with general square matrices.

import numpy as np
from scipy import linalg

# Create a sample matrix (representing a dataset correlation matrix)
A = np.array([[4, 2, 1], 
              [2, 3, 2], 
              [1, 2, 5]])

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = linalg.eig(A)

print("Eigenvalues:")
print(eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)

This code produces:

Eigenvalues:
[7.38761906+0.j 3.48645647+0.j 1.12592447+0.j]

Eigenvectors:
[[-0.51488378 -0.71716055  0.46965459]
 [-0.53867823 -0.15552024 -0.82803335]
 [-0.66687365  0.67933364  0.30624393]]

I executed the above example code and added the screenshot below.

Note that the eig function returns complex values even for real inputs. For many practical applications, you might want to use only the real part if the imaginary components are very small.

Method 2: Use scipy.linalg.eigvals

If you’re only interested in the eigenvalues and not the eigenvectors, you can use the eigvals function to save computation time:

from scipy import linalg
import numpy as np

# Stock market returns correlation matrix
stock_corr = np.array([
    [1.0, 0.62, 0.41, 0.35],
    [0.62, 1.0, 0.39, 0.33],
    [0.41, 0.39, 1.0, 0.28],
    [0.35, 0.33, 0.28, 1.0]
])

# Calculate eigenvalues only
eigenvalues = linalg.eigvals(stock_corr)

print("Eigenvalues of stock correlation matrix:")
print(eigenvalues)

Output:

Eigenvalues of stock correlation matrix:
[2.21083921+0.j 0.3791224 +0.j 0.66531989+0.j 0.7447185 +0.j]

I executed the above example code and added the screenshot below.

This is particularly useful in financial portfolio analysis where eigenvalues help identify the number of significant risk factors.

Read Python SciPy Smoothing

Method 3: For Symmetric Matrices using scipy.linalg.eigh

When working with symmetric matrices (like covariance or correlation matrices), the eigh function is more efficient and guarantees real eigenvalues:

import numpy as np
from scipy import linalg

# Creating a symmetric covariance matrix
cov_matrix = np.array([[10.5, 2.3, 1.1],
                       [2.3, 8.7, 3.2],
                       [1.1, 3.2, 6.4]])

# Using eigh for symmetric matrices
eigenvalues, eigenvectors = linalg.eigh(cov_matrix)

print("Eigenvalues (sorted in ascending order):")
print(eigenvalues)
print("\nCorresponding eigenvectors:")
print(eigenvectors)

Output:

Eigenvalues (sorted in ascending order):
[ 4.11680929  8.22483754 13.25835317]

Corresponding eigenvectors:
[[ 0.07728347  0.73290521 -0.67592693]
 [-0.59654392 -0.50921708 -0.62034935]
 [ 0.79885081 -0.45116286 -0.39785609]]

I executed the above example code and added the screenshot below.

Note that eigh returns eigenvalues in ascending order, unlike eig which returns them in no particular order.

Method 4: Use scipy.sparse.linalg for Large Matrices

For very large matrices, especially sparse ones, standard eigenvalue methods become inefficient. SciPy provides specialized functions for such cases:

import numpy as np
from scipy import sparse
from scipy.sparse import linalg

# Create a large sparse matrix (simulating a network adjacency matrix)
size = 1000
diagonals = [np.ones(size), np.ones(size-1), np.ones(size-1)]
offsets = [0, 1, -1]
sparse_matrix = sparse.diags(diagonals, offsets, format='csr')

# Find the 5 largest eigenvalues
eigenvalues, eigenvectors = linalg.eigsh(sparse_matrix, k=5, which='LM')

print("5 largest eigenvalues of the sparse matrix:")
print(eigenvalues)

Output:

5 largest eigenvalues of the sparse matrix:
[1.9999939  1.99996859 1.99992407 1.99985998 1.99977629]

This method is particularly useful when analyzing large networks, where only the most significant eigenvalues are needed.

Check out Python SciPy Pairwise Distance

Solve Real-world Problems with Eigenvalues

Now, I will show you how to solve real-world problems with eigenvalues.

Example 1: Principal Component Analysis (PCA)

One of the most common applications of eigenvalues is in PCA for dimensionality reduction:

import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt

# Generate some correlated data (e.g., height and weight measurements)
np.random.seed(42)
n_samples = 100
height = np.random.normal(68, 3, n_samples)  # height in inches
weight = 2.2 * height + np.random.normal(0, 10, n_samples)  # weight in pounds
data = np.vstack((height, weight)).T

# Center the data
data_centered = data - np.mean(data, axis=0)

# Compute the covariance matrix
cov_matrix = np.cov(data_centered.T)

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = linalg.eigh(cov_matrix)

# Sort eigenvalues and eigenvectors in descending order
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Eigenvalues (principal component variances):")
print(eigenvalues)
print("\nEigenvectors (principal components):")
print(eigenvectors)

# Plot original data and principal components
plt.figure(figsize=(10, 6))
plt.scatter(data[:, 0], data[:, 1], alpha=0.7, label='Data Points')
for i, (eigenvalue, eigenvector) in enumerate(zip(eigenvalues, eigenvectors.T)):
    plt.arrow(np.mean(data, axis=0)[0], np.mean(data, axis=0)[1], 
              eigenvector[0] * eigenvalue, eigenvector[1] * eigenvalue,
              head_width=0.5, head_length=0.5, fc='red', ec='red',
              label=f'PC {i+1}')
plt.xlabel('Height (inches)')
plt.ylabel('Weight (pounds)')
plt.title('PCA of Height and Weight Data')
plt.legend()
plt.grid(True)
plt.axis('equal')

This PCA example visualizes how the principal components (determined by eigenvectors) align with the directions of maximum variance in the data.

Example 2: Spectral Clustering

Eigenvalues are also crucial in spectral clustering algorithms:

import numpy as np
from scipy import linalg
from sklearn.datasets import make_blobs
from sklearn.metrics.pairwise import rbf_kernel
import matplotlib.pyplot as plt

# Generate some clustered data
X, y = make_blobs(n_samples=300, centers=4, random_state=42)

# Compute the similarity matrix
similarity = rbf_kernel(X, gamma=0.1)

# Compute the Laplacian matrix
diagonal = np.sum(similarity, axis=1)
D = np.diag(diagonal)
L = D - similarity

# Compute eigenvalues and eigenvectors of the Laplacian
eigenvalues, eigenvectors = linalg.eigh(L)

# Print the smallest eigenvalues
print("Smallest eigenvalues:")
print(eigenvalues[:10])

# Plot the second and third eigenvectors
plt.figure(figsize=(10, 6))
plt.scatter(eigenvectors[:, 1], eigenvectors[:, 2], c=y, cmap='viridis')
plt.title('Spectral Clustering Embedding')
plt.xlabel('Second Eigenvector')
plt.ylabel('Third Eigenvector')
plt.colorbar(label='True Cluster')
plt.grid(True)

This spectral clustering example demonstrates how eigenvalues of the graph Laplacian help identify the number of clusters, while the eigenvectors provide a lower-dimensional embedding that reveals the cluster structure.

Read Python SciPy Spatial Distance Cdist

Practical Tips for Working with Eigenvalues

Check for symmetry: Use eigh instead of eig for symmetric matrices for better performance and numerical stability.
Watch for numerical precision: Eigenvalues that should be zero might show up as very small numbers due to floating-point errors. You might need to round values close to zero.
For large matrices: Consider using sparse methods or computing only the eigenvalues you need (the largest or the smallest few).
Visualize eigenvectors: For 2D or 3D data, visualizing eigenvectors can provide insights into the principal directions in your data.
Check for complex eigenvalues: In some applications, complex eigenvalues indicate oscillatory behavior in the system.

I hope you found this article helpful in understanding how to compute and use eigenvalues with SciPy. Whether you’re analyzing data, building machine learning models, or solving engineering problems, eigenvalues provide powerful insights into the structure and behavior of your systems.

Python SciPy Eigenvalues

Eigenvalues and Why They Are Important

Method 1: Use scipy.linalg.eig

Method 2: Use scipy.linalg.eigvals

Method 3: For Symmetric Matrices using scipy.linalg.eigh

Method 4: Use scipy.sparse.linalg for Large Matrices

Solve Real-world Problems with Eigenvalues

Example 1: Principal Component Analysis (PCA)

Example 2: Spectral Clustering

Practical Tips for Working with Eigenvalues

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends