Plot A Best Fit Line In Matplotlib

I’ve found that visualizing data effectively is just as important as analyzing it. One of the most common tasks I encounter is plotting a best-fit line to understand trends and relationships within data.

Matplotlib, Python’s go-to plotting library, provides easy ways to add a best-fit line to your scatter plots. In this article, I’ll walk you through different methods to plot a best-fit line using Matplotlib. These techniques will help you present your data clearly and professionally.

Let’s get started!

This Tutorial Covers:

What is a Best Fit Line?

A best-fit line, also known as a trend line or regression line, is a straight line that best represents the relationship between two variables. It minimizes the distance between the data points and the line itself. This line helps you understand whether there’s a positive, negative, or no correlation between the variables.

For example, if you’re analyzing monthly sales data across different U.S. regions, a best-fit line can show if sales are generally increasing or decreasing over time.

Method 1: Use NumPy’s polyfit Function with Matplotlib

The quickest way I use to plot a best-fit line is by leveraging NumPy’s polyfit function, which fits a polynomial (a line, in this case) to your data.

Steps

Import the necessary libraries:

import numpy as np
import matplotlib.pyplot as plt

Create your data (for instance, monthly sales figures):

months = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
sales = np.array([200, 220, 250, 270, 300, 310, 330, 360, 390, 420, 450, 480])

Plot your scatter plot:

plt.scatter(months, sales, color='blue', label='Sales Data')

Calculate the best-fit line coefficients:

coefficients = np.polyfit(months, sales, 1)  # 1 means linear
slope, intercept = coefficients

Generate the y-values for the best-fit line:

best_fit_line = slope * months + intercept

Plot the best-fit line:

plt.plot(months, best_fit_line, color='red', label='Best Fit Line')

Add labels and legend, then show the plot:

plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales with Best Fit Line')
plt.legend()
plt.show()

I executed the above example code and added the screenshot below.

This method is fast and effective for linear trends. I often use it for quick exploratory data analysis during projects.

Read Matplotlib Legend Font Size

Method 2: Use SciPy’s linregress for Statistical Details

If you want more statistical insights like the correlation coefficient, p-value, or standard error, SciPy’s linregress function is a great option.

Import SciPy’s stats module along with Matplotlib:

from scipy import stats
import matplotlib.pyplot as plt
import numpy as np

Prepare your data (similar to the previous example).
Perform linear regression:

slope, intercept, r_value, p_value, std_err = stats.linregress(months, sales)

Calculate the best-fit line:

best_fit_line = slope * months + intercept

Plot data and line:

plt.scatter(months, sales, color='green', label='Sales Data')
plt.plot(months, best_fit_line, color='orange', label='Best Fit Line')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales with Regression Line')
plt.legend()
plt.show()

Optionally, print the regression statistics:

print(f"R-squared: {r_value**2:.3f}")
print(f"P-value: {p_value:.4f}")

I executed the above example code and added the screenshot below.

This method is my go-to when I need to validate the strength and significance of the trend before making business decisions.

Check out Matplotlib Secondary y-Axis

Method 3: Use Seaborn’s regplot for Quick Visualization

If you prefer a higher-level library built on Matplotlib, Seaborn’s regplot can plot scatter points and the regression line in one call.

Steps:

Install Seaborn if you haven’t already:

pip install seaborn

Import Seaborn and prepare data:

import seaborn as sns
import pandas as pd

data = pd.DataFrame({
   'Month': months,
   'Sales': sales
   })

Plot with regression line:

sns.regplot(x='Month', y='Sales', data=data)
plt.title('Sales Trend with Best Fit Line')
plt.show()

I executed the above example code and added the screenshot below.

Seaborn handles the fitting internally and provides confidence intervals by default. I find this method handy when creating polished visualizations for reports or presentations.

Read Matplotlib Set Axis Range

Tips for Better Best Fit Line Visualizations

Label your axes clearly — It helps stakeholders understand the data context.
Choose colors wisely — Make sure your best-fit line stands out but doesn’t overpower the scatter points.
Check assumptions — Linear regression assumes a linear relationship; if your data is nonlinear, consider polynomial fits.
Add statistical info — Displaying R-squared or p-values can strengthen your analysis credibility.

Plotting a best-fit line in Matplotlib is an essential skill for any Python developer working with data. Whether you want a quick visual trend line or detailed statistical insights, the methods I shared will cover your needs.

I start with NumPy polyfit for quick checks, then move to SciPy’s linregress when I need to dig deeper. For elegant visualizations, Seaborn regplot is my favorite.