Best Fit a Line to a Scatter Plot in Python Matplotlib

As a developer, I was working on a project where I had to analyze sales data from different US states and show whether there was a clear trend between advertising spend and revenue.

The scatter plot gave me a good picture, but it was difficult to explain the overall trend to my team. That’s when I decided to add a best-fit line to the scatter plot using Python Matplotlib.

In this tutorial, I will share the exact methods I use to best fit a line to a scatter plot in Python Matplotlib. I’ll cover multiple approaches so you can pick the one that works best for your dataset.

Why Add a Best Fit Line?

When you plot a scatter plot in Python, you see individual data points. This is great for raw visualization, but it can be hard to identify the overall relationship between two variables.

Adding a best-fit line (also called a trendline or regression line) helps you summarize the relationship in a single line. It makes it easier to explain trends, predict values, and communicate insights clearly.

Method 1 – Best Fit Line Using numpy.polyfit in Python

The first method I often use is Python numpy.polyfit. It’s quick, simple, and works well when I want a straight-line fit for my scatter plot.

Here’s a practical example where I plot advertising spend vs. sales revenue for a few US states.

import numpy as np
import matplotlib.pyplot as plt

# Sample data: Advertising spend vs Sales revenue (in thousands of USD)
advertising_spend = np.array([5, 10, 15, 20, 25, 30, 35, 40])
sales_revenue = np.array([12, 25, 33, 45, 52, 60, 72, 85])

# Scatter plot
plt.scatter(advertising_spend, sales_revenue, color='blue', label='Data points')

# Fit line using numpy.polyfit
slope, intercept = np.polyfit(advertising_spend, sales_revenue, 1)
best_fit_line = slope * advertising_spend + intercept

# Plot the best fit line
plt.plot(advertising_spend, best_fit_line, color='red', label=f'Best Fit Line: y={slope:.2f}x+{intercept:.2f}')

plt.xlabel('Advertising Spend (in $1000)')
plt.ylabel('Sales Revenue (in $1000)')
plt.title('Best Fit Line using numpy.polyfit in Python')
plt.legend()
plt.show()

You can refer to the screenshot below to see the output.

Best Fit Line to Scatter Plot in Python Matplotlib

This code creates a scatter plot of the data and overlays a red best-fit line. You just need to pass your X and Y values to np.polyfit, and it gives you the slope and intercept of the line.

Method 2 – Best Fit Line Using Python’s scipy.stats.linregress

Another method I often use is scipy.stats.linregress. This not only gives you the slope and intercept but also provides additional statistics like the correlation coefficient and p-value.

Here’s how I apply it in Python:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Sample data: Advertising spend vs Sales revenue
advertising_spend = np.array([5, 10, 15, 20, 25, 30, 35, 40])
sales_revenue = np.array([12, 25, 33, 45, 52, 60, 72, 85])

# Scatter plot
plt.scatter(advertising_spend, sales_revenue, color='green', label='Data points')

# Linear regression using scipy
slope, intercept, r_value, p_value, std_err = stats.linregress(advertising_spend, sales_revenue)
best_fit_line = slope * advertising_spend + intercept

# Plot the best fit line
plt.plot(advertising_spend, best_fit_line, color='orange', label=f'Line: y={slope:.2f}x+{intercept:.2f}')

plt.xlabel('Advertising Spend (in $1000)')
plt.ylabel('Sales Revenue (in $1000)')
plt.title('Best Fit Line using scipy.stats.linregress in Python')
plt.legend()
plt.show()

# Print additional statistics
print(f"Slope: {slope:.2f}, Intercept: {intercept:.2f}")
print(f"R-squared: {r_value**2:.2f}")

You can refer to the screenshot below to see the output.

Matplotlib Best Fit Line to Scatter Plot in Python

This approach is useful when I want more than just the line. For example, the R-squared value tells me how well the line explains the variability in the data.

Method 3 – Best Fit Line Using seaborn.regplot in Python

If you’re already using Seaborn for visualization, you can add a best-fit line with just one line of code using regplot.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
advertising_spend = [5, 10, 15, 20, 25, 30, 35, 40]
sales_revenue = [12, 25, 33, 45, 52, 60, 72, 85]

# Scatter plot with best fit line
sns.regplot(x=advertising_spend, y=sales_revenue, color='purple', line_kws={"color":"red"})

plt.xlabel('Advertising Spend (in $1000)')
plt.ylabel('Sales Revenue (in $1000)')
plt.title('Best Fit Line using Seaborn regplot in Python')
plt.show()

You can refer to the screenshot below to see the output.

Python Best Fit Line to Scatter Plot in Matplotlib

This is the simplest method if you want both a scatter plot and best best-fit line in one go. I often use it when I’m doing exploratory data analysis.

Method 4 – Best Fit Line Using Python’s numpy.poly1d

Sometimes, I prefer to use numpy.poly1d because it makes the equation reusable. You can create a polynomial function and use it for predictions as well.

Here’s how it works:

import numpy as np
import matplotlib.pyplot as plt

# Sample data
advertising_spend = np.array([5, 10, 15, 20, 25, 30, 35, 40])
sales_revenue = np.array([12, 25, 33, 45, 52, 60, 72, 85])

# Fit line using polyfit
coefficients = np.polyfit(advertising_spend, sales_revenue, 1)
polynomial = np.poly1d(coefficients)

# Generate line values
x_line = np.linspace(min(advertising_spend), max(advertising_spend), 100)
y_line = polynomial(x_line)

# Plot
plt.scatter(advertising_spend, sales_revenue, color='blue', label='Data points')
plt.plot(x_line, y_line, color='red', label=f'Best Fit Line: {polynomial}')

plt.xlabel('Advertising Spend (in $1000)')
plt.ylabel('Sales Revenue (in $1000)')
plt.title('Best Fit Line using numpy.poly1d in Python')
plt.legend()
plt.show()

I like this method because once you have the polynomial object, you can easily calculate predictions for new values of advertising spend.

Tips for Using Best Fit Lines in Python

  • Always visualize your scatter plot before fitting a line. If the data doesn’t look linear, a straight line may not be the right choice.
  • Use R-squared or p-value from scipy.stats.linregress to check if the line is meaningful.
  • For non-linear data, consider polynomial fits with higher degrees using numpy.polyfit.
  • Keep your audience in mind. For business presentations, a simple line is often more effective than a complex curve.

When I first started adding best-fit lines to scatter plots in Python, I thought it would take a lot of code. But as you can see, there are multiple simple ways to do it depending on your needs.

If you just want a quick visualization, seaborn.regplot is perfect. If you need more control and statistics, scipy.stats.linregress is my go-to. And if I want to reuse the equation for predictions, I prefer numpy.poly1d.

All these methods are easy to apply once you know them, and they can make your scatter plots much more powerful and insightful.

You may also read:

Leave a Comment

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.