How to Use Python Scipy Gaussian_KDE?

Recently, I was working on a data analysis project where I needed to estimate the probability density function of a dataset. The issue is, traditional histograms weren’t giving me the smooth representation I needed. This is where Gaussian Kernel Density Estimation (KDE) from SciPy came to the rescue.

In this article, I’ll cover everything you need to know about using Gaussian_kde in SciPy, from basic implementation to advanced customization.

So let’s get in..!

What is Gaussian Kernel Density Estimation?

Gaussian KDE is a non-parametric way to estimate the probability density function of a random variable. Simply put, it helps you visualize the distribution of your data in a smooth curve rather than blocky histograms.

Think of it as placing a small Gaussian (normal) bump at each data point, then adding them all up to get a smooth curve. This gives you a much better sense of the underlying distribution.

Basic Implementation of gaussian_kde

Let’s start with a simple example to see how gaussian_kde works:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Generate some random data (let's say stock prices)
np.random.seed(42)
stock_prices = np.random.normal(250, 30, 1000)  # Mean $250, std $30

# Create the kernel density estimate
kde = stats.gaussian_kde(stock_prices)

# Create a range of x values to evaluate the KDE
x_vals = np.linspace(150, 350, 1000)
y_vals = kde(x_vals)

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(x_vals, y_vals, 'r-', label='KDE')
plt.hist(stock_prices, bins=30, density=True, alpha=0.5, label='Histogram')
plt.title('Stock Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

gaussian_kde python

This code creates a smooth KDE curve over a histogram of simulated stock price data. The gaussian_kde function takes your data array and returns a callable function that you can evaluate at any point.

Read SciPy Stats

Customize Bandwidth

One of the most important parameters in KDE is the bandwidth, which controls how smooth your density estimate will be:

# Default bandwidth (Scott's Rule)
kde_default = stats.gaussian_kde(stock_prices)

# Custom bandwidth - smaller for more detail
kde_narrow = stats.gaussian_kde(stock_prices, bw_method=0.3)

# Custom bandwidth - larger for smoother curve
kde_wide = stats.gaussian_kde(stock_prices, bw_method=1.5)

# Plot all three for comparison
plt.figure(figsize=(12, 7))
plt.plot(x_vals, kde_default(x_vals), 'r-', label='Default bandwidth')
plt.plot(x_vals, kde_narrow(x_vals), 'g-', label='Narrow bandwidth (0.3)')
plt.plot(x_vals, kde_wide(x_vals), 'b-', label='Wide bandwidth (1.5)')
plt.hist(stock_prices, bins=30, density=True, alpha=0.2, label='Histogram')
plt.title('Effect of Bandwidth on KDE')
plt.xlabel('Price ($)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

gaussian_kde

The bw_method parameter can be:

  • A string: ‘scott’ (default) or ‘silverman’
  • A scalar: directly proportional to the bandwidth
  • A function: to calculate the bandwidth

Choose a smaller value for more detail (can be noisy) or a larger value for a smoother curve (might miss important features).

Check out SciPy Misc

Multivariate KDE

One of the strengths of gaussian_kde is its ability to handle multivariate data. Let’s see how to create a 2D density plot:

# Generate 2D data (e.g., height and weight)
np.random.seed(42)
heights = np.random.normal(175, 10, 1000)  # cm
weights = 0.5 * heights + np.random.normal(0, 10, 1000)  # correlated with height

# Create the 2D KDE
kde_2d = stats.gaussian_kde(np.vstack([heights, weights]))

# Create a grid of points to evaluate the KDE
h_grid = np.linspace(150, 200, 100)
w_grid = np.linspace(60, 120, 100)
H, W = np.meshgrid(h_grid, w_grid)
positions = np.vstack([H.ravel(), W.ravel()])

# Evaluate the KDE at the grid points
z = kde_2d(positions)
Z = z.reshape(H.shape)

# Plot the results
plt.figure(figsize=(10, 8))
plt.scatter(heights, weights, alpha=0.3, s=20)
plt.contour(H, W, Z, colors='k', linewidths=1)
plt.contourf(H, W, Z, cmap='viridis', alpha=0.5)
plt.colorbar(label='Density')
plt.title('2D Kernel Density Estimation: Height vs Weight')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.tight_layout()
plt.show()

I executed the above example code and added the screenshot below.

scipy gaussian kde

This creates a beautiful 2D density plot showing the relationship between height and weight. The darker areas represent regions with a higher density of points.

Read SciPy Integrate

Evaluate KDE at Specific Points

Sometimes you need to know the density at specific points rather than plotting the entire curve:

# Create the KDE
incomes = np.random.lognormal(mean=11, sigma=0.4, size=1000)  # Income distribution
kde = stats.gaussian_kde(incomes)

# Evaluate at specific points
points_of_interest = [40000, 60000, 80000, 100000]
densities = kde(points_of_interest)

# Print results
for point, density in zip(points_of_interest, densities):
    print(f"Density at ${point:,}: {density:.8f}")

This is particularly useful when you need to compare the relative likelihood of different values or find the most probable regions.

Use KDE for Smoothed Bootstrapping

KDE can be combined with bootstrapping to generate new samples with slight variations:

# Original data
sales_data = np.random.gamma(shape=2, scale=1000, size=100)  # Monthly sales

# Create KDE
kde = stats.gaussian_kde(sales_data)

# Generate new samples from the KDE
smoothed_bootstrap_samples = kde.resample(1000)

# Plot comparison
plt.figure(figsize=(12, 6))
plt.hist(sales_data, bins=20, alpha=0.5, label='Original data')
plt.hist(smoothed_bootstrap_samples[0], bins=20, alpha=0.5, label='KDE resampled')
plt.title('Original Data vs KDE Resampled Data')
plt.xlabel('Monthly Sales ($)')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

This technique gives you new samples that follow the same distribution as your original data but with slight variations, which can be useful for simulations and uncertainty analysis.

Read SciPy Signal

Weighted KDE

If some data points are more important than others, you can use weights:

# Data with weights
data = np.concatenate([np.random.normal(100, 10, 500), 
                       np.random.normal(150, 15, 500)])
weights = np.concatenate([np.ones(500), np.ones(500) * 2])  # Second group is twice as important

# Weighted KDE
kde_weighted = stats.gaussian_kde(data, weights=weights)
kde_unweighted = stats.gaussian_kde(data)

# Plot
x = np.linspace(50, 200, 1000)
plt.figure(figsize=(12, 6))
plt.plot(x, kde_weighted(x), 'r-', label='Weighted KDE')
plt.plot(x, kde_unweighted(x), 'b-', label='Unweighted KDE')
plt.hist(data, bins=30, density=True, alpha=0.3)
plt.title('Weighted vs Unweighted KDE')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Notice how the weighted KDE puts more emphasis on the second peak since those points have higher weights.

Check out SciPy Convolve

Practical Applications

Gaussian KDE has many real-world applications:

  1. Financial analysis – Estimating the distribution of stock returns
  2. Demographics – Smoothing income or age distributions
  3. Natural sciences – Analyzing distributions of measurements
  4. Machine learning – Feature engineering and density-based anomaly detection

For example, in anomaly detection, you can flag data points with very low density values as potential outliers:

# Generate data with outliers
normal_data = np.random.normal(0, 1, 1000)
outliers = np.random.uniform(-5, 5, 20)
data = np.concatenate([normal_data, outliers])

# Create KDE
kde = stats.gaussian_kde(normal_data)  # Train on normal data only

# Calculate density for all points
densities = kde(data)

# Flag potential outliers (points with very low density)
threshold = np.percentile(densities, 1)  # Bottom 1% as threshold
outlier_indices = np.where(densities < threshold)[0]

# Plot
plt.figure(figsize=(12, 6))
plt.scatter(data, np.zeros_like(data), alpha=0.5, label='All data')
plt.scatter(data[outlier_indices], np.zeros_like(data[outlier_indices]), 
            color='red', marker='x', s=100, label='Potential outliers')
plt.title('Anomaly Detection using KDE')
plt.xlabel('Value')
plt.yticks([])
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

SciPy’s gaussian_kde is a powerful tool for density estimation that can enhance your data analysis toolkit. It allows you to transform discrete data points into smooth, continuous probability distributions, giving you better insights into your data.

Whether you’re analyzing stock market returns, demographic data, or scientific measurements, gaussian_kde provides a flexible way to understand the underlying distributions. The ability to customize bandwidth, use weights, and handle multiple dimensions makes it adaptable to a wide range of applications.

I hope this guide helps you implement Gaussian_kde in your projects. If you have any questions or suggestions, feel free to leave them in the comments below.

Other Python articles you may also like:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.