Python SciPy Curve Fit: Simplify Your Data Analysis with Fitting Methods

Recently, I was working on a data science project where I needed to fit a curve to my experimental data points. The issue is finding the right tool that can handle complex fitting while being easy to use. That’s when SciPy’s curve_fit function came to the rescue.

In this article, I’ll cover several ways you can use SciPy’s curve_fit to fit functions to your data (including linear, polynomial, and custom models).

So let’s start..!

What is SciPy Curve Fit?

SciPy’s curve_fit is a useful function from the scipy.optimize module that fits a mathematical function to data points. It uses non-linear least squares to fit any user-defined function to data.

Think of curve_fit as your “pattern finder.” You give it your data points and a function form, and it finds the best parameters that make your function match those points as closely as possible.

For example, if you’re analyzing sales data that seems to follow an exponential growth pattern, curve_fit can tell you exactly what that exponential function is.

Fitting Methods

Now, I will explain the fitting methods that will help to simplify your data analysis.

Read Python SciPy Stats Fit

1: Linear Curve Fitting

The simplest case is fitting a straight line to your data. Here’s how you can do it:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate some sample data with noise
x_data = np.linspace(0, 10, 20)
y_data = 2 * x_data + 3 + np.random.normal(0, 1, 20)

# Define the linear function to fit
def linear_func(x, m, b):
    return m * x + b

# Fit the data
params, covariance = curve_fit(linear_func, x_data, y_data)

# Get the fitted parameters
m_fit, b_fit = params
print(f"Fitted line: y = {m_fit:.2f}x + {b_fit:.2f}")

# Plot the results
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, linear_func(x_data, m_fit, b_fit), 'r-', label='Fit')
plt.legend()
plt.show()

You can refer to the screenshot below to see the output:

curve_fit maxfev

In this example, I’m fitting a line of the form y = mx + b to noisy data. The curve_fit function returns the optimal parameters (m and b) that minimize the sum of squared differences between the data and the model.

2: Polynomial Curve Fitting

When your data follows a more complex pattern, you might want to try fitting a polynomial:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate some sample data that follows a quadratic pattern
x_data = np.linspace(-5, 5, 30)
y_data = 2 * x_data**2 - 3 * x_data + 1 + np.random.normal(0, 3, 30)

# Define a quadratic function
def quadratic_func(x, a, b, c):
    return a * x**2 + b * x + c

# Fit the data
params, covariance = curve_fit(quadratic_func, x_data, y_data)

# Get the fitted parameters
a_fit, b_fit, c_fit = params
print(f"Fitted curve: y = {a_fit:.2f}x² + {b_fit:.2f}x + {c_fit:.2f}")

# Plot the results
plt.scatter(x_data, y_data, label='Data')
x_fine = np.linspace(-5, 5, 100)
plt.plot(x_fine, quadratic_func(x_fine, a_fit, b_fit, c_fit), 'r-', label='Fit')
plt.legend()
plt.show()

You can refer to the screenshot below to see the output:

scipy curve fit

This example fits a quadratic function (ax² + bx + c) to data that roughly follows a parabolic shape. You can extend this to higher-degree polynomials by adding more parameters.

3: Exponential Curve Fitting

Exponential models are common in many fields, from finance to population growth:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate sample data that follows exponential growth
x_data = np.linspace(0, 4, 20)
y_data = 2 * np.exp(1.5 * x_data) + np.random.normal(0, 2, 20)

# Define an exponential function
def exp_func(x, a, b):
    return a * np.exp(b * x)

# Fit the data
params, covariance = curve_fit(exp_func, x_data, y_data)

# Get the fitted parameters
a_fit, b_fit = params
print(f"Fitted exponential: y = {a_fit:.2f} * e^({b_fit:.2f}x)")

# Plot the results
plt.scatter(x_data, y_data, label='Data')
x_fine = np.linspace(0, 4, 100)
plt.plot(x_fine, exp_func(x_fine, a_fit, b_fit), 'r-', label='Fit')
plt.legend()
plt.show()

You can refer to the screenshot below to see the output:

curve fit python

This fits an exponential function a·eᵇˣ to the data. This is particularly useful for modeling phenomena like population growth, compound interest, or radioactive decay.

Read Python SciPy Butterworth Filter

4: Set Bounds for Parameters

Sometimes you have prior knowledge about the possible values of your parameters. SciPy allows you to set bounds:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate sample data for a sigmoid curve
x_data = np.linspace(-5, 5, 30)
y_data = 10 / (1 + np.exp(-x_data)) + np.random.normal(0, 0.5, 30)

# Define a sigmoid function
def sigmoid(x, L, x0, k):
    return L / (1 + np.exp(-k * (x - x0)))

# Set bounds for the parameters (L, x0, k)
# L should be positive, x0 can be any value, k should be positive
bounds = ([5, -np.inf, 0], [15, np.inf, 10])

# Fit with bounds
params, covariance = curve_fit(sigmoid, x_data, y_data, bounds=bounds)

# Get the fitted parameters
L_fit, x0_fit, k_fit = params
print(f"Fitted sigmoid: L={L_fit:.2f}, x0={x0_fit:.2f}, k={k_fit:.2f}")

# Plot the results
plt.scatter(x_data, y_data, label='Data')
x_fine = np.linspace(-5, 5, 100)
plt.plot(x_fine, sigmoid(x_fine, L_fit, x0_fit, k_fit), 'r-', label='Fit')
plt.legend()
plt.show()

In this example, I’m fitting a sigmoid function with parameters that have specific bounds. This is useful when you know certain physical or logical constraints on your parameters.

Check out Python SciPy IIR Filter

5: Weighted Curve Fitting

If your data points have different levels of uncertainty, you can assign weights to give more importance to more reliable points:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate sample data with varying noise levels
x_data = np.linspace(0, 10, 20)
# Increasing noise with x
sigma = 0.5 + 0.5 * x_data
y_true = 2 * x_data - 1
y_data = y_true + np.random.normal(0, sigma, 20)

# Define a linear function
def linear_func(x, m, b):
    return m * x + b

# Fit with weights (1/sigma²)
weights = 1 / (sigma ** 2)
params, covariance = curve_fit(linear_func, x_data, y_data, sigma=sigma, absolute_sigma=True)

# Get the fitted parameters
m_fit, b_fit = params
print(f"Weighted fit: y = {m_fit:.2f}x + {b_fit:.2f}")

# Plot the results
plt.errorbar(x_data, y_data, yerr=sigma, fmt='o', label='Data with error bars')
plt.plot(x_data, linear_func(x_data, m_fit, b_fit), 'r-', label='Weighted fit')
plt.plot(x_data, y_true, 'g--', label='True function')
plt.legend()
plt.show()

This example shows how to handle data with varying uncertainty. By providing the sigma parameter to curve_fit, points with smaller sigma values influence the fit more strongly.

Read Python SciPy Sparse

6: Handle Initial Parameter Guesses

Sometimes, curve_fit struggles to find the right parameters without good initial guesses. Here’s how to provide starting values:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Generate some sample data for a damped oscillation
x_data = np.linspace(0, 10, 100)
y_data = 3 * np.exp(-0.5 * x_data) * np.cos(2 * np.pi * x_data) + np.random.normal(0, 0.2, 100)

# Define the damped oscillation function
def damped_osc(x, A, decay, freq, phase):
    return A * np.exp(-decay * x) * np.cos(2 * np.pi * freq * x + phase)

# Without good initial guesses, this might fail or give poor results
# Let's provide initial guesses
initial_guesses = [3, 0.5, 1.0, 0]

# Fit with initial guesses
params, covariance = curve_fit(damped_osc, x_data, y_data, p0=initial_guesses)

# Get the fitted parameters
A_fit, decay_fit, freq_fit, phase_fit = params
print(f"Amplitude: {A_fit:.2f}")
print(f"Decay constant: {decay_fit:.2f}")
print(f"Frequency: {freq_fit:.2f}")
print(f"Phase: {phase_fit:.2f}")

# Plot the results
plt.scatter(x_data, y_data, s=10, label='Data')
x_fine = np.linspace(0, 10, 1000)
plt.plot(x_fine, damped_osc(x_fine, *params), 'r-', label='Fitted curve')
plt.legend()
plt.title("Damped Oscillation Curve Fitting")
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.show()

The p0 parameter in curve_fit is crucial for complex functions like damped oscillations. Without good initial guesses, the optimization might get stuck in local minima or fail to converge.

7: Custom Error Function Fitting

Sometimes you might want to use a different error metric than the default sum of squares. Here’s an example of a custom approach:

import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

# Generate some sample data with outliers
x_data = np.linspace(0, 10, 20)
y_true = 2 * x_data + 1
# Add noise and some outliers
y_data = y_true + np.random.normal(0, 0.5, 20)
y_data[3] += 5  # Add an outlier
y_data[15] -= 7  # Add another outlier

# Define the model function
def linear_func(x, params):
    m, b = params
    return m * x + b

# Define a custom error function that is less sensitive to outliers
# Using mean absolute error instead of mean squared error
def custom_error(params, x, y):
    y_pred = linear_func(x, params)
    # Mean absolute error
    return np.mean(np.abs(y - y_pred))

# Initial guess
initial_guess = [1, 0]

# Perform the optimization
result = minimize(custom_error, initial_guess, args=(x_data, y_data))

# Get the fitted parameters
m_fit, b_fit = result.x
print(f"Robust fit: y = {m_fit:.2f}x + {b_fit:.2f}")

# Compare with standard curve_fit (which uses least squares)
from scipy.optimize import curve_fit

def std_linear(x, m, b):
    return m * x + b

params_std, _ = curve_fit(std_linear, x_data, y_data)
m_std, b_std = params_std
print(f"Standard fit: y = {m_std:.2f}x + {b_std:.2f}")

# Plot the results
plt.scatter(x_data, y_data, label='Data with outliers')
plt.plot(x_data, linear_func(x_data, [m_fit, b_fit]), 'r-', label='Robust fit (MAE)')
plt.plot(x_data, std_linear(x_data, m_std, b_std), 'g--', label='Standard fit (MSE)')
plt.plot(x_data, y_true, 'k:', label='True relationship')
plt.legend()
plt.title("Comparison of Fitting Methods with Outliers")
plt.show()

This example demonstrates how to implement a custom error function that’s more robust to outliers by using mean absolute error instead of mean squared error. For data with outliers, this can provide a more accurate fit.

Check out how to use Python SciPy

Real-World Example: Stock Market Growth

Let’s apply curve fitting to a real-world scenario, analyzing the growth of the S&P 500 index:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime

# Sample S&P 500 data (year, value)
years = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022])
sp500 = np.array([1257, 1257, 1426, 1848, 2058, 2043, 2238, 2673, 2506, 3230, 3756, 4766, 3839])

# Convert years to numeric values for fitting (years since 2010)
x_data = years - 2010

# Try different models
def linear_model(x, m, b):
    return m * x + b

def exp_model(x, a, b, c):
    return a * np.exp(b * x) + c

def power_model(x, a, b, c):
    return a * (x + c) ** b

# Fit all models
params_linear, _ = curve_fit(linear_model, x_data, sp500)
params_exp, _ = curve_fit(exp_model, x_data, sp500, p0=[1000, 0.1, 1000])
params_power, _ = curve_fit(power_model, x_data, sp500, p0=[100, 1, 1])

# Calculate R-squared for each model
def r_squared(y_true, y_pred):
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - (ss_res / ss_tot)

y_linear = linear_model(x_data, *params_linear)
y_exp = exp_model(x_data, *params_exp)
y_power = power_model(x_data, *params_power)

r2_linear = r_squared(sp500, y_linear)
r2_exp = r_squared(sp500, y_exp)
r2_power = r_squared(sp500, y_power)

print(f"Linear model: R² = {r2_linear:.4f}")
print(f"Exponential model: R² = {r2_exp:.4f}")
print(f"Power model: R² = {r2_power:.4f}")

# Create prediction for future years
future_years = np.arange(2010, 2031)
future_x = future_years - 2010
best_model = exp_model  # Choose the model with highest R²
best_params = params_exp

# Plot the results
plt.figure(figsize=(12, 6))
plt.scatter(years, sp500, color='blue', label='Actual S&P 500')
plt.plot(future_years, best_model(future_x, *best_params), 'r-', 
         label=f'Exponential model (R² = {r2_exp:.4f})')
plt.xlabel('Year')
plt.ylabel('S&P 500 Index Value')
plt.title('S&P 500 Historical Data and Future Projection')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

# Print the prediction for 2030
print(f"Projected S&P 500 value for 2030: {best_model(2030-2010, *best_params):.0f}")

In this realistic example, I’ve compared different models to see which best describes the S&P 500’s growth. The exponential model typically provides the best fit for long-term market growth, and we can use it to make projections for future years.

Read Working with Python, Lil_Matrix SciPy

Tips for Successful Curve Fitting

After years of working with curve_fit, here are some practices I’ve found helpful:

  1. Choose an appropriate model: Make sure your function matches the expected behavior of your data. Plot your data first to get an intuitive feel.
  2. Provide good initial guesses: For complex models, starting close to the solution helps avoid convergence problems.
  3. Scale your data: If your x and y values are on very different scales, normalize them first for better numerical stability.
  4. Handle bounds carefully: Use parameter bounds when you have domain knowledge about acceptable values.
  5. Evaluate your fit: Always calculate metrics like R-squared or residuals to assess how well your model fits.
  6. Be wary of overfitting: A more complex model might fit better, but generalize worse. Consider using cross-validation for model selection.

SciPy’s curve_fit is an invaluable tool for data analysis that has saved me countless hours across numerous projects. Whether you’re analyzing financial trends, scientific experiments, or marketing data, mastering this function will significantly enhance your analytical capabilities.

If you’re working with data that needs fitting, give these methods a try. You’ll be surprised at how much information you can extract from seemingly chaotic data points with just a few lines of code

Other SciPy-related articles that you may also like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.