How To Use Python Scipy Gaussian

Recently, I was working on a data analysis project where I needed to estimate the probability density function of a dataset. The issue is, traditional histograms weren’t giving me the smooth representation I needed. This is where Gaussian Kernel Density Estimation (KDE) from SciPy came to the rescue.

In this article, I’ll cover everything you need to know about using Gaussian_kde in SciPy, from basic implementation to advanced customization.

So let’s get in..!

This Tutorial Covers:

What is Gaussian Kernel Density Estimation?

Gaussian KDE is a non-parametric way to estimate the probability density function of a random variable. Simply put, it helps you visualize the distribution of your data in a smooth curve rather than blocky histograms.

Think of it as placing a small Gaussian (normal) bump at each data point, then adding them all up to get a smooth curve. This gives you a much better sense of the underlying distribution.

Basic Implementation of gaussian_kde

Let’s start with a simple example to see how gaussian_kde works:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Generate some random data (let's say stock prices)
np.random.seed(42)
stock_prices = np.random.normal(250, 30, 1000)  # Mean $250, std $30

# Create the kernel density estimate
kde = stats.gaussian_kde(stock_prices)

# Create a range of x values to evaluate the KDE
x_vals = np.linspace(150, 350, 1000)
y_vals = kde(x_vals)

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(x_vals, y_vals, 'r-', label='KDE')
plt.hist(stock_prices, bins=30, density=True, alpha=0.5, label='Histogram')
plt.title('Stock Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

This code creates a smooth KDE curve over a histogram of simulated stock price data. The gaussian_kde function takes your data array and returns a callable function that you can evaluate at any point.

Read SciPy Stats

Customize Bandwidth

One of the most important parameters in KDE is the bandwidth, which controls how smooth your density estimate will be:

# Default bandwidth (Scott's Rule)
kde_default = stats.gaussian_kde(stock_prices)

# Custom bandwidth - smaller for more detail
kde_narrow = stats.gaussian_kde(stock_prices, bw_method=0.3)

# Custom bandwidth - larger for smoother curve
kde_wide = stats.gaussian_kde(stock_prices, bw_method=1.5)

# Plot all three for comparison
plt.figure(figsize=(12, 7))
plt.plot(x_vals, kde_default(x_vals), 'r-', label='Default bandwidth')
plt.plot(x_vals, kde_narrow(x_vals), 'g-', label='Narrow bandwidth (0.3)')
plt.plot(x_vals, kde_wide(x_vals), 'b-', label='Wide bandwidth (1.5)')
plt.hist(stock_prices, bins=30, density=True, alpha=0.2, label='Histogram')
plt.title('Effect of Bandwidth on KDE')
plt.xlabel('Price ($)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

The bw_method parameter can be:

A string: ‘scott’ (default) or ‘silverman’
A scalar: directly proportional to the bandwidth
A function: to calculate the bandwidth

Choose a smaller value for more detail (can be noisy) or a larger value for a smoother curve (might miss important features).

Check out SciPy Misc

Multivariate KDE

One of the strengths of gaussian_kde is its ability to handle multivariate data. Let’s see how to create a 2D density plot:

# Generate 2D data (e.g., height and weight)
np.random.seed(42)
heights = np.random.normal(175, 10, 1000)  # cm
weights = 0.5 * heights + np.random.normal(0, 10, 1000)  # correlated with height

# Create the 2D KDE
kde_2d = stats.gaussian_kde(np.vstack([heights, weights]))

# Create a grid of points to evaluate the KDE
h_grid = np.linspace(150, 200, 100)
w_grid = np.linspace(60, 120, 100)
H, W = np.meshgrid(h_grid, w_grid)
positions = np.vstack([H.ravel(), W.ravel()])

# Evaluate the KDE at the grid points
z = kde_2d(positions)
Z = z.reshape(H.shape)

# Plot the results
plt.figure(figsize=(10, 8))
plt.scatter(heights, weights, alpha=0.3, s=20)
plt.contour(H, W, Z, colors='k', linewidths=1)
plt.contourf(H, W, Z, cmap='viridis', alpha=0.5)
plt.colorbar(label='Density')
plt.title('2D Kernel Density Estimation: Height vs Weight')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.tight_layout()
plt.show()

I executed the above example code and added the screenshot below.

This creates a beautiful 2D density plot showing the relationship between height and weight. The darker areas represent regions with a higher density of points.

Read SciPy Integrate

Evaluate KDE at Specific Points

Sometimes you need to know the density at specific points rather than plotting the entire curve:

# Create the KDE
incomes = np.random.lognormal(mean=11, sigma=0.4, size=1000)  # Income distribution
kde = stats.gaussian_kde(incomes)

# Evaluate at specific points
points_of_interest = [40000, 60000, 80000, 100000]
densities = kde(points_of_interest)

# Print results
for point, density in zip(points_of_interest, densities):
    print(f"Density at ${point:,}: {density:.8f}")

This is particularly useful when you need to compare the relative likelihood of different values or find the most probable regions.

Use KDE for Smoothed Bootstrapping

KDE can be combined with bootstrapping to generate new samples with slight variations:

# Original data
sales_data = np.random.gamma(shape=2, scale=1000, size=100)  # Monthly sales

# Create KDE
kde = stats.gaussian_kde(sales_data)

# Generate new samples from the KDE
smoothed_bootstrap_samples = kde.resample(1000)

# Plot comparison
plt.figure(figsize=(12, 6))
plt.hist(sales_data, bins=20, alpha=0.5, label='Original data')
plt.hist(smoothed_bootstrap_samples[0], bins=20, alpha=0.5, label='KDE resampled')
plt.title('Original Data vs KDE Resampled Data')
plt.xlabel('Monthly Sales ($)')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

This technique gives you new samples that follow the same distribution as your original data but with slight variations, which can be useful for simulations and uncertainty analysis.

Read SciPy Signal

Weighted KDE

If some data points are more important than others, you can use weights:

# Data with weights
data = np.concatenate([np.random.normal(100, 10, 500), 
                       np.random.normal(150, 15, 500)])
weights = np.concatenate([np.ones(500), np.ones(500) * 2])  # Second group is twice as important

# Weighted KDE
kde_weighted = stats.gaussian_kde(data, weights=weights)
kde_unweighted = stats.gaussian_kde(data)

# Plot
x = np.linspace(50, 200, 1000)
plt.figure(figsize=(12, 6))
plt.plot(x, kde_weighted(x), 'r-', label='Weighted KDE')
plt.plot(x, kde_unweighted(x), 'b-', label='Unweighted KDE')
plt.hist(data, bins=30, density=True, alpha=0.3)
plt.title('Weighted vs Unweighted KDE')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Notice how the weighted KDE puts more emphasis on the second peak since those points have higher weights.

Check out SciPy Convolve

Practical Applications

Gaussian KDE has many real-world applications:

Financial analysis – Estimating the distribution of stock returns
Demographics – Smoothing income or age distributions
Natural sciences – Analyzing distributions of measurements
Machine learning – Feature engineering and density-based anomaly detection

For example, in anomaly detection, you can flag data points with very low density values as potential outliers:

# Generate data with outliers
normal_data = np.random.normal(0, 1, 1000)
outliers = np.random.uniform(-5, 5, 20)
data = np.concatenate([normal_data, outliers])

# Create KDE
kde = stats.gaussian_kde(normal_data)  # Train on normal data only

# Calculate density for all points
densities = kde(data)

# Flag potential outliers (points with very low density)
threshold = np.percentile(densities, 1)  # Bottom 1% as threshold
outlier_indices = np.where(densities < threshold)[0]

# Plot
plt.figure(figsize=(12, 6))
plt.scatter(data, np.zeros_like(data), alpha=0.5, label='All data')
plt.scatter(data[outlier_indices], np.zeros_like(data[outlier_indices]), 
            color='red', marker='x', s=100, label='Potential outliers')
plt.title('Anomaly Detection using KDE')
plt.xlabel('Value')
plt.yticks([])
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

SciPy’s gaussian_kde is a powerful tool for density estimation that can enhance your data analysis toolkit. It allows you to transform discrete data points into smooth, continuous probability distributions, giving you better insights into your data.

Whether you’re analyzing stock market returns, demographic data, or scientific measurements, gaussian_kde provides a flexible way to understand the underlying distributions. The ability to customize bandwidth, use weights, and handle multiple dimensions makes it adaptable to a wide range of applications.

I hope this guide helps you implement Gaussian_kde in your projects. If you have any questions or suggestions, feel free to leave them in the comments below.

Other Python articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

How to Use Python Scipy Gaussian_KDE?

What is Gaussian Kernel Density Estimation?

Basic Implementation of gaussian_kde

Customize Bandwidth

Multivariate KDE

Evaluate KDE at Specific Points

Use KDE for Smoothed Bootstrapping

Weighted KDE

Practical Applications

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends