# Matplotlib best fit line

In this Python tutorial, we will discuss How to plot the best-fit line in matplotlib in python, and we will also cover the following topics:

• Best fit line
• Matplotlib best fit line
• Matplotlib best fit line using numpy.polyfit()
• Matplotlib best fit line histogram
• Matplotlib best fit curve
• Matplotlib best fit line to scatter

## Best fit line

The best fit line in a 2-dimensional graph refers to a line that defines the optimal relationship of the x-axis and y-axis coordinates of the data points plotted as a scatter plot on the graph.

The best fit line or optimal relationship can be achieved by minimizing the distances of the data points from the purposed line.

A linear equation represents a line mathematically. The normal equation of the line is as follow:

(A * x) + (B * y) + C = 0

• Here, x and y are the variables that represent the x-axis and y-axis values of data points.
• A and B are the coefficients of variable x and y, and C is the constant. Collectively, these are known as the parameters of a line which decides the line’s shape and position on the graph.

But, the most commonly used form of a line is the intercept-slope form, which is as follows:

y = (m * x) + c

• Here, x and y are the variables that represent the x-axis and y-axis values of data points.
• m is the coefficient of the variable x which represents the slope of the line on the graph. Slope is the parameter of the line that decides the angle of the line on the graph.
• c is the constant value that represents the y-intercept of the line on the graph. Intercept is the parameter of the line that decides the position of the line on the graph.

We can convert a normal form to the slope-intercept form as follows:

(A * x) + (B * y) + C = 0

(B * y) = -C – (A * x)

y = (-(A * x) – C) / B

y = ((-A / B) * x) + (-C / B)

On comparing this equation with the slope-intercept form of a line.

We get, m = (-A / B) and c = (-C / B)

We will be using the slope-intercept form of the line throughout this post.

The most commonly used method to find the parameters of a line to best fit the given data points is the least square method in regression analysis.

The simple regression analysis is the method of specifying a relationship between a single numeric dependent variable (Here, y) and a numeric independent variable (Here, x).

## Matplotlib best fit line

We can plot a line that fits best to the scatter data points in matplotlib. First, we need to find the parameters of the line that makes it the best fit.

We will be doing it by applying the vectorization concept of linear algebra.

First, let’s understand the algorithm that we will be using to find the parameters of the best fit line.

The equation of the line is: y = (m * x) + c

Let’s change this into y = theta0 + (theta1 * x); Here, theta0 and theta1 are the parameters representing the c (intercept) and m (slope) respectively of the line.

Now, let’s change this equation into the vector form:

• Let, N be the number of data points given.
• Let, the y be the column vector of N rows where each row represents the y-coordinate of each data point.
• Let, theta be the column vector of 2 rows with each parameter of the line (theta0 and theta1) be as the row value of the vector.
• Let, X be the matrix of 2XN where 1st column consists of value 1 for each row and 2nd column consists of the x-coordinate values of the N data points.

Now, the equation in vector form will be like this: y = X . theta

We can calculate and get the optimal parameter values (theta0 and theta1) for the given data points by using the least square method equation in vector form, that is as follows:

theta = (XT . X)-1 . (XT . y); Here, XT is the transpose of the matrix X, and (XT . X)-1 is the inverse of the resulted matrix from (XT . X)

Now, let’s implement this algorithm using python and plot the resulted line.

``````# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Preparing the data to be computed and plotted
dt = np.array([
[0.05, 0.11],
[0.13, 0.14],
[0.19, 0.17],
[0.24, 0.21],
[0.27, 0.24],
[0.29, 0.32],
[0.32, 0.30],
[0.36, 0.39],
[0.37, 0.42],
[0.40, 0.40],
[0.07, 0.09],
[0.02, 0.04],
[0.15, 0.19],
[0.39, 0.32],
[0.43, 0.48],
[0.44, 0.41],
[0.47, 0.49],
[0.50, 0.57],
[0.53, 0.59],
[0.57, 0.51],
[0.58, 0.60]
])

# Preparing X and y data from the given data
x = dt[:, 0].reshape(dt.shape, 1)
X = np.append(x, np.ones((dt.shape, 1)), axis=1)
y = dt[:, 1].reshape(dt.shape, 1)

# Calculating the parameters using the least square method
theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(f'The parameters of the line: {theta}')

# Now, calculating the y-axis values against x-values according to
# the parameters theta0 and theta1
y_line = X.dot(theta)

# Plotting the data points and the best fit line
plt.scatter(x, y)
plt.plot(x, y_line, 'r')
plt.title('Best fit line using regression method')
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()``````

## Matplotlib best fit line using numpy.polyfit()

We can plot the best fit line to given data points using the numpy.polyfit() function.

This function is a pre-defined function that takes 3 mandatory arguments as x-coordinate values (as an iterable), y-coordinate values (as an iterable), and degree of the equation (1 for linear, 2 for quadratic, 3 for cubic, …).

The syntax is as follows:

``numpy.polyfit(x, y, degree)``

Now, let’s take a look at the example and understand the implementation of the function.

``````# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Preparing the data to be computed and plotted
dt = np.array([
[0.05, 0.11],
[0.13, 0.14],
[0.19, 0.17],
[0.24, 0.21],
[0.27, 0.24],
[0.29, 0.32],
[0.32, 0.30],
[0.36, 0.39],
[0.37, 0.42],
[0.40, 0.40],
[0.07, 0.09],
[0.02, 0.04],
[0.15, 0.19],
[0.39, 0.32],
[0.43, 0.48],
[0.44, 0.41],
[0.47, 0.49],
[0.50, 0.57],
[0.53, 0.59],
[0.57, 0.51],
[0.58, 0.60]
])

# Preparing X and y from the given data
X = dt[:, 0]
y = dt[:, 1]

# Calculating parameters (Here, intercept-theta1 and slope-theta0)
# of the line using the numpy.polyfit() function
theta = np.polyfit(X, y, 1)

print(f'The parameters of the line: {theta}')

# Now, calculating the y-axis values against x-values according to
# the parameters theta0, theta1 and theta2
y_line = theta + theta * X

# Plotting the data points and the best fit line
plt.scatter(X, y)
plt.plot(X, y_line, 'r')
plt.title('Best fit line using numpy.polyfit()')
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()``````

## Matplotlib best fit line histogram

We can fit the distribution of a histogram and plot that curve/line in python.

We can use the library scipy in python, the steps to do the task are given below:

• First, we can call the function scipy.stats.norm.fit() with the parameter data to plot the histogram, to get the statistics of the data like mean and standard deviation.
• And then, we will call the function scipy.stats.norm.pdf() with the parameters x (bins for histogram), mean of the data, and standard deviation of the data, to get the y-values against the given data for the best fit curve.
• Then, we can plot the curve with the histogram.

``````# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np
import scipy.stats

dt = np.random.normal(0, 1, 1000)

# Plotting the sample data on histogram and getting the bins
_, bins, _ = plt.hist(dt, 25, density=1, alpha=0.5)

# Getting the mean and standard deviation of the sample data dt
mn, std = scipy.stats.norm.fit(dt)

# Getting the best fit curve y values against the x data, bins
y_curve = scipy.stats.norm.pdf(bins, mn, std)

# Plotting the best fit curve
plt.plot(bins, y_curve, 'k')

plt.title('Best fit curve for histogram')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()``````

## Matplotlib best fit curve

We can plot a curve that fits best to the given data points in the python if the data points when scatter plotted on the graph show some upper degree curve trend (quadratic, cubic, …).

We can use the numpy.polyfit() function. This function actually returns the best fit curve for any polynomial trend. As we have discussed this function in the earlier topic, so let’s practice an example for better understanding:

``````# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Preparing the data to be computed and plotted
dt = np.array([
[0.5, 0.28],
[0.5, 0.29],
[0.5, 0.33],
[0.7, 0.21],
[0.7, 0.23],
[0.7, 0.26],
[0.8, 0.24],
[0.8, 0.25],
[0.8, 0.29],
[0.9, 0.28],
[0.9, 0.30],
[0.9, 0.31],
[1.0, 0.30],
[1.0, 0.33],
[1.0, 0.35]
])

# Preparing X and y from the given data
X = dt[:, 0]
y = dt[:, 1]

# Calculating parameters (theta0, theta1 and theta2)
# of the 2nd degree curve using the numpy.polyfit() function
theta = np.polyfit(X, y, 2)

print(f'The parameters of the curve: {theta}')

# Now, calculating the y-axis values against x-values according to
# the parameters theta0, theta1 and theta2
y_line = theta + theta * pow(X, 1) + theta * pow(X, 2)

# Plotting the data points and the best fit 2nd degree curve
plt.scatter(X, y)
plt.plot(X, y_line, 'r')
plt.title('2nd degree best fit curve using numpy.polyfit()')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()``````

## Matplotlib best fit line to scatter

We have already discussed two different methods, for getting the best fit line to scatter. So, let’s do another method to get the best fit line.

We can use the pre-defined linear regression model in sklearn librery’s/module’s linear_model sub-module to get the best fit line for the given data points. The steps to create a model and get the best fit line parameters are as follows:

• First, import the LinearRegression from the sklearn.linear_model sub-module.
• Then, create a new model using LinearRegression(), lets say model = LinearRegression().
• And, fit the given data to the created model using model.fit() method that takes 2 arguments x and y.
• And then, get the y values for the predicted best fit line using the function model.predict() against the x values given in the function as the parameter.
• Now, we can plot the resulted y values with the x values as a line plot that gives the best fit line for the given data points.
``````# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Importing the sklearn's linear_model,
# a pre-defined linear regression model
from sklearn.linear_model import LinearRegression

# Preparing the data to be computed and plotted
dt = np.array([
[0.05, 0.11],
[0.13, 0.14],
[0.19, 0.17],
[0.24, 0.21],
[0.27, 0.24],
[0.29, 0.32],
[0.32, 0.30],
[0.36, 0.39],
[0.37, 0.42],
[0.40, 0.40],
[0.07, 0.09],
[0.02, 0.04],
[0.15, 0.19],
[0.39, 0.32],
[0.43, 0.48],
[0.44, 0.41],
[0.47, 0.49],
[0.50, 0.57],
[0.53, 0.59],
[0.57, 0.51],
[0.58, 0.60]
])

# Preparing X and y from the given data
X = dt[:, 0].reshape(len(dt), 1)
y = dt[:, 1].reshape(len(dt), 1)

# Creatoing a linear regression model and fitting the data to the model
model = LinearRegression()
model.fit(X, y)

# Now, predicting the y values according to the model
y_line = model.predict(X)

# Printing thr coffecient/parameter of the resulted line
print(f'The parameters of the line: {model.coef_}')

# Plotting the data points and the best fit line
plt.scatter(X, y)
plt.plot(X, y_line, 'r')
plt.title('Best fit line using linear regression model from sklearn')
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()``````