In this Python tutorial, we will learn How Scikit learn Linear regression work in Python and we will also cover different examples related to Linear Regression. And, we will cover these topics.
- Scikit learn Linear Regression
- Scikit learn Linear Regression example
- Scikit learn Linear Regression advantages and disadvantages
- Scikit learn Linear Regression gradient descent
- Scikit learn Linear Regression p-value
- Scikit learn Linear Regression multiple features
- Scikit learn Linear Regression categorical Variable
Scikit learn Linear Regression
In this section, we will learn about How Scikit learn linear regression works in Python.
- Linear Regression is a form of predictive modeling technique that investigates the relationship between a dependent and independent variable.
- Linear regression is a linear approach for modeling the relationship between the dependent and independent variables.
Code:
In the following code, we will import Linear Regression from sklearn.linear_model by which we investigate the relationship between dependent and independent variables.
regression = LinearRegression().fit(x, y) is used to fit the linear model.
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(x, np.array([1, 2])) + 3
regression = LinearRegression().fit(x, y)
regression.score(x, y)
Output:
After running the above code we get the following output in which we can see that the score of linear regression is printed on the screen.
Also, check: Scikit-learn logistic regression
Scikit learn Linear Regression example
In this section, we will learn about how scikit learn linear regression example work in Python.
As we know the linear regression evaluates the relationship between one or more predictive variables.
Code:
In the following code, we will import the dataset,linear_model from sklearn by which we can evaluate the relationship between predictive variables.
- diabetes_x, diabetes_y = datasets.load_diabetes(return_X_y=True) is used to import the diabetes dataset.
- diabetes_x = diabetes_x[:, np.newaxis, 2] is used only one feature.
- diabetes_x_train = diabetes_x[:-20] is used to split the data into train or test sets.
- regression = linear_model.LinearRegression() is used to create a linear regression objects.
- regression.fit(diabetes_x_train, diabetes_y_train) is used to train the model using training set.
- print(“Coefficients: \n”, regression.coef_) is used to print the coefficient.
- print(“Meansquarederror:%.2f”%mean_squared_error(diabetes_y_test,diabetes_y_pred)) is used to calculate the mean square error.
- plot.scatter(diabetes_x_test, diabetes_y_test, color=”blue”) is used to plot the scatter graph.
import matplotlib.pyplot as plot
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
diabetes_x, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_x = diabetes_x[:, np.newaxis, 2]
diabetes_x_train = diabetes_x[:-20]
diabetes_x_test = diabetes_x[-20:]
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
regression = linear_model.LinearRegression()
regression.fit(diabetes_x_train, diabetes_y_train)
diabetes_y_pred = regression.predict(diabetes_x_test)
print("Coefficients: \n", regression.coef_)
print("Mean squared error: %.2f" % mean_squared_error(diabetes_y_test, diabetes_y_pred))
print("Coefficient of determination: %.2f" % r2_score(diabetes_y_test, diabetes_y_pred))
plot.scatter(diabetes_x_test, diabetes_y_test, color="blue")
plot.plot(diabetes_x_test, diabetes_y_pred, color="red", linewidth=3)
plot.xticks(())
plot.yticks(())
plot.show()
Output:
After running the above code we get the following output in which we can see that the mean square error and coefficient of determination are printed on the screen.
Read: Scikit learn Decision Tree
Scikit learn Linear Regression advantages and disadvantages
In this section, we will learn about the Scikit learn linear regression advantages and disadvantages in Python.
Advantages:
- Linear Regression is simple and easy to implement and explains the coefficient of the output.
2. Linear regression avoids the dimension reduction technique but is permitted to over-fitting.
3. When we investigate the relationship between dependent and independent variables then the linear regression is best to fit.
4. Linear regression has less complexity as compared to other algorithms.
Disadvantages:
- In linear regression, there are outliers which has a great impact on the boundaries, and regression.
2. Linear regression focus on the mean of dependent and independent variables. If the mean does give the complete description of a single variable the linear variable does not give the description of the relationship among the variable.
3. Linear regression investigates the relationship between dependent and independent variables so it means that there is a straight-line relationship between them.
Read: Scikit-learn Vs Tensorflow – Detailed Comparison
Scikit learn Linear Regression gradient descent
In this section, we will learn about how scikit learn linear regression gradient descent work in Python.
- Before moving forward we should have some piece of knowledge about Gradient descent. The gradient is working as a slope function and the gradient simply calculates the changes in the weights.
- The higher the gradient the lower the slope and the faster the model.
Code:
In the following code, we will import SGDClassifier from sklearn.linear_model by which we can work as a slope function.
from sklearn.linear_model import SGDClassifier
x = [[0., 0.], [1., 1.]]
y = [0, 1]
clf = SGDClassifier(loss="hinge", penalty="l2", max_iter=5)
clf.fit(x, y)
Output:
After running the above code we get the following output in which we can see that the stochastic gradient descent value is printed on the screen.
Read: Scikit learn accuracy_score
Scikit learn Linear Regression p-value
In this section, we will learn about how scikit learn linear regression p-value works in python.
P-value is defined as the probability when the null hypothesis is zero or we can say that the statistical significance that tells the null hypothesis is rejected or not. Generally, the Vale of p is less than 0.05
Code:
In the following code, we will import LinearRegression from sklearn.linear_model by which we calculate the p-value.
- sm.add_constant(x) is used to add the constant.
- p.fit() is used to fit the values.
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from scipy import stats
diabetesdataset = datasets.load_diabetes()
x = diabetesdataset.data
y = diabetesdataset.target
x2 = sm.add_constant(x)
p = sm.OLS(y, x2)
p2 = p.fit()
print(p2.summary())
Output:
After running the above code we get the following output in which we can see that the p-value is printed on the screen.
Read: Scikit learn Hierarchical Clustering
Scikit learn Linear Regression multiple features
In this section, we will learn about how Linear Regression multiple features work in Python.
- As we know linear Regression is a form of predictive modeling technique that investigates the relationship between a dependent and independent variable.
- Linear regression has multiple features and one of the features is ordinary least square.
- This linear regression fits the linear model with their coefficient to minimize the remaining sum of observation between the observed target in the dataset.
Code:
In the following code, we will import linear_model from sklearn by which we calculate the coefficient of regression.
- regression.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) is used to fit the model.
- regression.coef_ is used to calculate the coefficient of a model.
from sklearn import linear_model
regression = linear_model.LinearRegression()
regression.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
regression.coef_
Output:
After running the above code we get the following output in which we can see that the regression coefficient is printed on the screen.
Also, read: Scikit learn Feature Selection
Scikit learn Linear Regression categorical Variable
In this section, we will learn how scikit learn linear regression categorical variable work in Python.
Before moving forward we will have some piece of knowledge about the categorical variable.
A categorical variable is defined as one that takes only non-numeric values such as age, gender, etc.
Code:
In the following code, we will import the linear regression from sklearn by which we can create a categorical dummy variable.
data = pd.DataFrame({‘color’: [‘orange’, ‘blue’, ‘pink’, ‘yellow’]}) is used to create a dataset.
from sklearn import linear_model
import pandas as pd
data = pd.DataFrame({'color': ['orange', 'blue', 'pink', 'yellow']})
print(pd.get_dummies(data))
Output:
After running the above code we get the following output in which we can see that the categorical data is printed on the screen.
Also, take a look at some more tutorials on Scikit learn.
- Scikit learn Ridge Regression
- Scikit learn Hidden Markov Model
- Scikit learn hidden_layer_sizes
- Scikit learn Gradient Descent
So, in this tutorial we discussed Scikit learn linear regression and we have covered different examples related to its implementation. Here is the list of examples that we have covered.
- Scikit learn Linear Regression
- Scikit learn Linear Regression example
- Scikit learn Linear Regression advantages and disadvantages
- Scikit learn Linear Regression gradient descent
- Scikit learn Linear Regression p-value
- Scikit learn Linear Regression multiple features
- Scikit learn Linear Regression categorical Variable
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.