Python Scipy Stats Norm

Recently, I was working on a data science project where I needed to analyze normally distributed data and make statistical inferences. The solution was clear: I needed to use SciPy’s stats.norm functionality, which provides a powerful set of tools for working with normal distributions in Python.

In this tutorial, I’ll walk you through everything you need to know about using scipy.stats.norm with practical examples that you can apply to your projects.

Let’s get started.

What is Scipy Stats Norm?

SciPy’s stats.norm is a continuous random variable class that represents the normal (Gaussian) distribution in Python. It’s part of the broader SciPy ecosystem, which is designed for scientific and technical computing.

The normal distribution is fundamental in statistics and data science, appearing in countless real-world scenarios, from heights and weights of populations to measurement errors and stock market returns.

Get Started with Scipy Stats Norm

Before we begin using stats.norm, you’ll need to have SciPy installed:

pip install scipy numpy matplotlib

Now, let’s import the necessary libraries:

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

Read Working with Python, Lil_Matrix SciPy

Method 1: Generate Random Numbers from a Normal Distribution

One of the most common uses of stats.norm is generating normally distributed random numbers.

# Generate 1000 random numbers from a normal distribution with mean=0 and std=1
random_numbers = stats.norm.rvs(loc=0, scale=1, size=1000)

# Plot a histogram of the generated data
plt.hist(random_numbers, bins=30, density=True, alpha=0.7)
plt.title('Normal Distribution: μ=0, σ=1')
plt.xlabel('Value')
plt.ylabel('Density')
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

stats.norm.rvs

In this example, loc represents the mean (μ) and scale represents the standard deviation (σ) of our normal distribution.

Method 2: Calculate Probability Density Function (PDF)

The PDF gives us the likelihood of a value occurring in our distribution.

# Create a range of x values
x = np.linspace(-4, 4, 1000)

# Calculate the PDF for a standard normal distribution (mean=0, std=1)
pdf = stats.norm.pdf(x, loc=0, scale=1)

# Plot the PDF
plt.plot(x, pdf)
plt.title('Normal Distribution PDF: μ=0, σ=1')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

scipy stats norm

The PDF curve shows us the classic bell shape of the normal distribution.

Read How to use Python SciPy Linprog

Method 3: Compute the Cumulative Distribution Function (CDF)

The CDF tells us the probability that a random variable will take a value less than or equal to a particular value.

# Calculate the CDF for the same values
cdf = stats.norm.cdf(x, loc=0, scale=1)

# Plot the CDF
plt.plot(x, cdf)
plt.title('Normal Distribution CDF: μ=0, σ=1')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.grid(True, alpha=0.3)
plt.show()

I executed the above example code and added the screenshot below.

scipy norm

Method 4: Find Percentiles with Percent Point Function (PPF)

The PPF is the inverse of the CDF. It gives us the value below which a specified probability of the random variable lies.

# Find the 95th percentile of a normal distribution with mean=100 and std=15
percentile_95 = stats.norm.ppf(0.95, loc=100, scale=15)
print(f"95th percentile: {percentile_95:.2f}")

# Calculate several percentiles
percentiles = [0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99]
values = stats.norm.ppf(percentiles, loc=100, scale=15)

for p, v in zip(percentiles, values):
    print(f"{p*100}th percentile: {v:.2f}")

I executed the above example code and added the screenshot below.

scipy.stats.norm

This is particularly useful for finding confidence intervals and critical values.

Method 5: Calculate Probabilities Between Two Points

We can use the CDF to find the probability that a value falls within a specific range.

# Calculate probability of a value falling between 90 and 110 in a N(100, 15) distribution
prob = stats.norm.cdf(110, loc=100, scale=15) - stats.norm.cdf(90, loc=100, scale=15)
print(f"Probability between 90 and 110: {prob:.4f}")

Method 6: Fit Normal Distribution to Data

Sometimes we have data and want to determine if it follows a normal distribution.

# Generate some sample data that's roughly normal
data = stats.norm.rvs(loc=50, scale=5, size=1000)

# Fit a normal distribution to the data
mu, sigma = stats.norm.fit(data)
print(f"Fitted mean: {mu:.2f}")
print(f"Fitted standard deviation: {sigma:.2f}")

# Plot the original data and the fitted distribution
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100)
fitted_pdf = stats.norm.pdf(x, loc=mu, scale=sigma)

plt.hist(data, bins=30, density=True, alpha=0.7, label='Sample Data')
plt.plot(x, fitted_pdf, 'r-', label='Fitted Normal Distribution')
plt.title('Fitting a Normal Distribution to Data')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Method 7: Test for Normality

To determine if our data follows a normal distribution, we can use statistical tests.

# Perform the Shapiro-Wilk test for normality
stat, p_value = stats.shapiro(data)
print(f"Shapiro-Wilk test - statistic: {stat:.4f}, p-value: {p_value:.4f}")

# Interpret the result
alpha = 0.05
if p_value > alpha:
    print("The data appears to be normally distributed (fail to reject H0)")
else:
    print("The data does not appear to be normally distributed (reject H0)")

Check out Use Python SciPy Differential Evolution

Method 8: Z-Scores and Standard Normal Distribution

Z-scores tell us how many standard deviations a data point is from the mean.

# Convert raw scores to z-scores
raw_scores = [85, 90, 95, 100, 105, 110, 115]
mean = 100
std_dev = 15

z_scores = [(score - mean) / std_dev for score in raw_scores]
print("Raw scores:", raw_scores)
print("Z-scores:", [f"{z:.2f}" for z in z_scores])

# Convert z-scores back to raw scores in a different scale
new_mean = 500
new_std_dev = 100
new_scores = [(z * new_std_dev) + new_mean for z in z_scores]
print("Rescaled scores:", [f"{score:.2f}" for score in new_scores])

This is particularly useful for standardizing test scores, like SAT or GRE in the US education system.

Method 9: Confidence Intervals for the Mean

We can use the normal distribution to compute confidence intervals.

# Sample data: test scores from a class
test_scores = stats.norm.rvs(loc=78, scale=12, size=30)

# Sample statistics
sample_mean = np.mean(test_scores)
sample_std = np.std(test_scores, ddof=1)  # Using n-1 for sample std
n = len(test_scores)
std_error = sample_std / np.sqrt(n)

# 95% confidence interval
z_critical = stats.norm.ppf(0.975)  # For 95% CI, we need the 97.5th percentile
margin_of_error = z_critical * std_error
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print(f"Sample mean: {sample_mean:.2f}")
print(f"95% Confidence Interval: ({confidence_interval[0]:.2f}, {confidence_interval[1]:.2f})")

Method 10: Analyze SAT Score Distributions

Let’s apply our knowledge to a real-world example: analyzing SAT scores, which follow a normal distribution.

# SAT scores are scaled to have mean=1000 and std=200
sat_mean = 1000
sat_std = 200

# Calculate the percentile for a student who scored 1250
percentile = stats.norm.cdf(1250, loc=sat_mean, scale=sat_std)
print(f"A score of 1250 is in the {percentile*100:.1f}th percentile")

# What score is needed to be in the top 10% of test takers?
top_10_percent_score = stats.norm.ppf(0.90, loc=sat_mean, scale=sat_std)
print(f"To be in the top 10%, you need a score of at least {top_10_percent_score:.0f}")

# Probability of scoring between 900 and 1100
middle_range_prob = stats.norm.cdf(1100, loc=sat_mean, scale=sat_std) - stats.norm.cdf(900, loc=sat_mean, scale=sat_std)
print(f"Probability of scoring between 900 and 1100: {middle_range_prob:.4f}")

# Plot the SAT score distribution
x = np.linspace(400, 1600, 1000)
sat_pdf = stats.norm.pdf(x, loc=sat_mean, scale=sat_std)

plt.figure(figsize=(10, 6))
plt.plot(x, sat_pdf)
plt.fill_between(x, sat_pdf, where=(x >= 900) & (x <= 1100), alpha=0.3)
plt.title('SAT Score Distribution')
plt.xlabel('SAT Score')
plt.ylabel('Probability Density')
plt.axvline(sat_mean, color='red', linestyle='--', alpha=0.7, label='Mean (1000)')
plt.axvline(top_10_percent_score, color='green', linestyle='--', alpha=0.7, label='90th Percentile')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

The normal distribution plays a crucial role in standardized testing in the US education system, which makes this example particularly relevant.

I hope you found this article helpful for understanding how to use scipy.stats.norm in Python. Whether you’re analyzing test scores, experimental data, or financial metrics, the normal distribution is a powerful tool in your data science arsenal.

Other SciPy-related articles you may like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.