Python Scipy Gamma

In my decade-plus of Python development, I’ve frequently needed to work with statistical distributions for data analysis and modeling. The gamma distribution is particularly useful for modeling skewed data that can’t go below zero, perfect for analyzing things like rainfall amounts, insurance claims, or service times.

SciPy makes working with gamma distributions remarkably simple, but there are some nuances worth understanding.

In this article, I’ll walk you through everything you need to know about using the gamma distribution in SciPy, from basic usage to practical applications.

The Gamma Distribution

The gamma distribution is a two-parameter continuous probability distribution that’s widely used for modeling positive data with right-skewed characteristics.

It’s defined by two parameters:

  • Shape parameter (a or k)
  • Scale parameter (θ)

This distribution is particularly useful when modeling variables that are always positive and may have a skewed distribution.

Python Scipy Gamma Distribution

Let’s start with a basic example of how to generate a gamma distribution:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Define shape and scale parameters
shape = 2.0
scale = 2.0

# Create a gamma distribution object
gamma_dist = stats.gamma(a=shape, scale=scale)

# Generate x values
x = np.linspace(0, 15, 1000)

# Calculate PDF values
pdf_values = gamma_dist.pdf(x)

# Plot the distribution
plt.figure(figsize=(10, 6))
plt.plot(x, pdf_values, 'r-', lw=2)
plt.title('Gamma Distribution (shape=2, scale=2)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid(True)
plt.show()

You can see the output in the screenshot below.

gamma function python

This code creates a gamma distribution with shape=2 and scale=2, then plots its probability density function.

Read Python SciPy Pairwise Distance

Generate Random Samples from Gamma Distribution

One common task is generating random samples from a gamma distribution:

# Generate 1000 random samples from the gamma distribution
samples = gamma_dist.rvs(size=1000)

# Plot histogram of samples
plt.figure(figsize=(10, 6))
plt.hist(samples, bins=30, density=True, alpha=0.7)
plt.plot(x, pdf_values, 'r-', lw=2)
plt.title('Gamma Distribution: Histogram of Samples vs. PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid(True)
plt.show()

You can see the output in the screenshot below.

gamma distribution python

This is particularly useful for simulations and Monte Carlo methods.

Python Scipy Gamma CDF

The Cumulative Distribution Function (CDF) gives the probability that a random variable is less than or equal to a certain value:

# Calculate CDF values
cdf_values = gamma_dist.cdf(x)

# Plot the CDF
plt.figure(figsize=(10, 6))
plt.plot(x, cdf_values, 'b-', lw=2)
plt.title('Gamma Distribution CDF (shape=2, scale=2)')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.grid(True)
plt.show()

You can see the output in the screenshot below.

python gamma function

This is helpful when you need to find the probability of a value falling within a certain range.

Python Scipy Gamma PPF

The Percent Point Function (PPF), also known as the quantile function, is the inverse of the CDF:

# Calculate quantiles
quantiles = np.linspace(0.01, 0.99, 10)
ppf_values = gamma_dist.ppf(quantiles)

print("Quantiles:")
for q, v in zip(quantiles, ppf_values):
    print(f"At probability {q:.2f}, the value is {v:.4f}")

This is useful when you need to find specific percentiles or create confidence intervals.

Python Scipy Gamma Loc

The gamma() method accepts a loc parameter that shifts the distribution. This is particularly useful when your data doesn’t start at zero:

# Create a gamma distribution with location parameter
loc = 5.0
shifted_gamma = stats.gamma(a=shape, scale=scale, loc=loc)

# Generate x values for the shifted distribution
x_shifted = np.linspace(0, 20, 1000)

# Calculate PDF values for the shifted distribution
pdf_shifted = shifted_gamma.pdf(x_shifted)

# Plot both distributions for comparison
plt.figure(figsize=(10, 6))
plt.plot(x, pdf_values, 'r-', lw=2, label='Original (loc=0)')
plt.plot(x_shifted, pdf_shifted, 'g-', lw=2, label=f'Shifted (loc={loc})')
plt.title('Effect of loc Parameter on Gamma Distribution')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

The loc parameter effectively shifts the entire distribution to the right by the specified amount.

Fit Data to a Gamma Distribution

When analyzing real-world data, we often need to fit it to a gamma distribution:

# Example: Fitting daily rainfall data (in inches) for Seattle
rainfall_data = np.array([0.8, 1.2, 0.5, 1.8, 2.1, 0.7, 1.0, 1.5, 0.9, 1.3, 
                          1.7, 0.6, 1.1, 2.0, 0.4, 1.4, 1.9, 0.3, 1.6, 2.2])

# Fit gamma distribution to the data
params = stats.gamma.fit(rainfall_data)
a, loc, scale = params

print(f"Fitted parameters: shape (a)={a:.4f}, loc={loc:.4f}, scale={scale:.4f}")

# Create the fitted gamma distribution
fitted_gamma = stats.gamma(a=a, loc=loc, scale=scale)

# Plot histogram and fitted PDF
x_fit = np.linspace(0, 3, 1000)
pdf_fit = fitted_gamma.pdf(x_fit)

plt.figure(figsize=(10, 6))
plt.hist(rainfall_data, bins=8, density=True, alpha=0.7, label='Rainfall Data')
plt.plot(x_fit, pdf_fit, 'r-', lw=2, label='Fitted Gamma PDF')
plt.title('Fitting Gamma Distribution to Seattle Rainfall Data')
plt.xlabel('Daily Rainfall (inches)')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

This example shows how to fit Seattle rainfall data to a gamma distribution, which is a common application in meteorology.

Check out Python SciPy Spatial Distance Cdist

Hypothesis Testing with Gamma Distribution

We can use the gamma distribution for hypothesis testing:

# Example: Testing if service times follow a gamma distribution
service_times = np.array([5.2, 4.8, 6.1, 5.5, 7.2, 4.9, 5.8, 6.3, 5.1, 5.7,
                         6.0, 5.3, 6.5, 4.7, 5.9, 6.2, 5.4, 6.7, 5.0, 6.4])

# Perform Kolmogorov-Smirnov test against gamma distribution
params = stats.gamma.fit(service_times)
a, loc, scale = params
fitted_gamma = stats.gamma(a=a, loc=loc, scale=scale)

ks_stat, p_value = stats.kstest(service_times, fitted_gamma.cdf)

print(f"Kolmogorov-Smirnov test:")
print(f"KS statistic: {ks_stat:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"The data {'does' if p_value > 0.05 else 'does not'} follow a gamma distribution (at 5% significance)")

This code tests whether a set of service times at a restaurant follows a gamma distribution, which can help with staffing and operational planning.

Read Python SciPy Fcluster

Python Scipy Gamma Quantile

Finding specific quantiles is often needed for confidence intervals:

# Calculate 95% confidence interval
lower_bound = gamma_dist.ppf(0.025)
upper_bound = gamma_dist.ppf(0.975)

print(f"95% Confidence Interval: [{lower_bound:.4f}, {upper_bound:.4f}]")

# Visualize the confidence interval
x_ci = np.linspace(0, 15, 1000)
pdf_ci = gamma_dist.pdf(x_ci)

plt.figure(figsize=(10, 6))
plt.plot(x_ci, pdf_ci, 'r-', lw=2)
plt.fill_between(x_ci, pdf_ci, where=((x_ci >= lower_bound) & (x_ci <= upper_bound)), 
                 color='blue', alpha=0.3, label='95% CI')
plt.axvline(lower_bound, color='g', linestyle='--', label=f'Lower bound: {lower_bound:.4f}')
plt.axvline(upper_bound, color='g', linestyle='--', label=f'Upper bound: {upper_bound:.4f}')
plt.title('Gamma Distribution with 95% Confidence Interval')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

This is particularly useful in risk analysis and financial modeling.

Check out Python SciPy Optimize Root

Compute Moments of the Gamma Distribution

The moments of a distribution give us insights into its characteristics:

# Calculate mean, variance, skewness, and kurtosis
mean = gamma_dist.mean()
variance = gamma_dist.var()
skewness = gamma_dist.stats(moments='s')
kurtosis = gamma_dist.stats(moments='k')

print(f"Mean: {mean:.4f}")
print(f"Variance: {variance:.4f}")
print(f"Skewness: {skewness:.4f}")
print(f"Kurtosis: {kurtosis:.4f}")

These statistics help us understand the shape and properties of our distribution.

Compare Multiple Gamma Distributions

Often, we need to compare different gamma distributions:

# Create several gamma distributions with different parameters
shapes = [1.0, 2.0, 5.0, 9.0]
scale = 1.0

plt.figure(figsize=(10, 6))
x = np.linspace(0, 15, 1000)

for shape in shapes:
    gamma_dist = stats.gamma(a=shape, scale=scale)
    plt.plot(x, gamma_dist.pdf(x), lw=2, label=f'shape={shape}, scale={scale}')

plt.title('Gamma Distributions with Different Shape Parameters')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

This visualization helps in understanding how changing parameters affect the distribution’s shape.

Practical Example: Insurance Claims Analysis

Let’s apply the gamma distribution to a real-world scenario – analyzing insurance claim amounts:

# Simulated insurance claim data (in thousands of dollars)
claim_amounts = np.array([3.2, 5.7, 2.8, 8.1, 4.5, 6.3, 3.9, 7.2, 5.1, 9.6,
                         4.8, 2.6, 6.7, 3.5, 5.3, 7.8, 4.2, 6.9, 8.5, 3.7,
                         5.9, 4.1, 7.4, 2.9, 6.1, 8.3, 3.4, 5.6, 9.2, 4.7
                         4.8, 2.6, 6.7, 3.5, 5.3, 7.8, 4.2, 6.9, 8.5, 3.7,
                         5.9, 4.1, 7.4, 2.9, 6.1, 8.3, 3.4, 5.6, 9.2, 4.7])

# Fit gamma distribution to the claim data
params = stats.gamma.fit(claim_amounts)
a, loc, scale = params
print(f"Fitted parameters for insurance claims:")
print(f"Shape (a) = {a:.4f}, loc = {loc:.4f}, scale = {scale:.4f}")

# Create the fitted gamma distribution
fitted_gamma = stats.gamma(a=a, loc=loc, scale=scale)

# Calculate expected claim amount
expected_claim = fitted_gamma.mean()
print(f"Expected claim amount: ${expected_claim:.2f}k")

# Calculate probability of a claim exceeding $10k
prob_exceed_10k = 1 - fitted_gamma.cdf(10)
print(f"Probability of a claim exceeding $10k: {prob_exceed_10k:.4f} or {prob_exceed_10k*100:.2f}%")

# Plot the data and fitted distribution
x_fit = np.linspace(0, 15, 1000)
pdf_fit = fitted_gamma.pdf(x_fit)

plt.figure(figsize=(10, 6))
plt.hist(claim_amounts, bins=10, density=True, alpha=0.7, label='Claim Data')
plt.plot(x_fit, pdf_fit, 'r-', lw=2, label='Fitted Gamma PDF')
plt.axvline(expected_claim, color='g', linestyle='--', 
            label=f'Expected claim: ${expected_claim:.2f}k')
plt.title('Insurance Claim Amount Analysis using Gamma Distribution')
plt.xlabel('Claim Amount (thousands of dollars)')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

This example demonstrates how insurance companies might use the gamma distribution to model claim amounts, calculate expected losses, and determine risk thresholds.

Check out Python SciPy Interpolate

Work with the Special Case: Gamma Function

The gamma distribution is related to the gamma function. SciPy provides the gamma function through scipy.special:

from scipy.special import gamma

# Calculate gamma function for different values
x_values = [1, 2, 3, 4, 5]
for x in x_values:
    gamma_x = gamma(x)
    print(f"Gamma({x}) = {gamma_x}")

# For integer values, gamma(n) = (n-1)!
for x in x_values:
    if x.is_integer():
        factorial = np.math.factorial(int(x)-1)
        print(f"({int(x)}-1)! = {factorial}")

Understanding the gamma function is important when working with certain statistical applications and mathematical models.

Read Python Scipy Odeint

The Gamma Distribution in Action

Throughout my career, I’ve found the gamma distribution incredibly useful for modeling various phenomena. It’s particularly well-suited for analyzing:

  • Insurance claim amounts
  • Rainfall and weather patterns
  • Service times in queuing theory
  • Machine reliability and lifetime analysis
  • Cell phone call durations

The flexibility of the gamma distribution, combined with SciPy’s robust implementation, makes it a powerful tool in a data scientist’s toolkit.

The next time you’re dealing with positive, right-skewed data, consider whether a gamma distribution might be an appropriate model. With the examples and techniques shown in this article, you’ll be well-equipped to apply them to your data analysis challenges.

Remember that proper parameter fitting is crucial for accurate modeling, and always validate your distribution choice with appropriate statistical tests.

You may also read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.