As a Python developer with over a decade of experience, I’ve seen firsthand how essential Gradient Descent is in machine learning. Whether you’re tweaking linear regression models or diving into neural networks, understanding Gradient Descent can dramatically improve your model’s performance.
In this article, I’ll walk you through how to use Gradient Descent with Scikit-Learn, one of the most popular Python libraries for machine learning. I’ll share practical tips and code examples based on real-world scenarios, especially relevant to data projects common in the USA.
Let’s dive in!
What is Gradient Descent?
Gradient Descent is an optimization algorithm used to minimize the cost function in machine learning models. Think of it as a way to find the lowest point in a landscape, the point where your model error is the smallest.
In simple terms, Gradient Descent iteratively adjusts model parameters to reduce prediction errors. It’s like tuning your car’s engine to get the best mileage; small adjustments lead to better performance.
How Scikit-Learn Uses Gradient Descent
Scikit-Learn abstracts many complexities of Gradient Descent. While it doesn’t expose Gradient Descent directly as a standalone function, many of its estimators use variants of Gradient Descent under the hood.
For example:
- LinearRegression uses the Ordinary Least Squares method, which doesn’t use Gradient Descent but a closed-form solution.
- SGDRegressor and SGDClassifier explicitly use Stochastic Gradient Descent.
- LogisticRegression can use different solvers, including those based on Gradient Descent.
I’ll focus on how to work with Gradient Descent explicitly using SGDRegressor and SGDClassifier.
Method 1: Use SGDRegressor for Linear Regression with Gradient Descent
When you want to perform linear regression but prefer Gradient Descent over the closed-form solution, SGDRegressor is your go-to.
Step 1: Import Libraries and Prepare Data
Let’s consider a practical example: predicting house prices in California based on features like the number of bedrooms, square footage, and the age of the house.
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Sample dataset (replace with actual California housing data)
data = pd.read_csv('california_housing.csv')
X = data[['bedrooms', 'sqft_living', 'age']]
y = data['price']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)Step 2: Scale Features
Gradient Descent converges faster when features are scaled.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)Step 3: Initialize and Train the Model
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, learning_rate='invscaling', eta0=0.01, random_state=42)
sgd_reg.fit(X_train_scaled, y_train)Here, max_iter controls the number of iterations, learning_rate defines how the step size changes, and eta0 is the initial learning rate.
Step 4: Evaluate Performance
from sklearn.metrics import mean_squared_error
y_pred = sgd_reg.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")I executed the above example code and added the screenshot below.

Method 2: Use SGDClassifier for Classification with Gradient Descent
Suppose you want to classify whether a loan application will be approved based on applicant data — a common use case in the USA financial sector.
Step 1: Prepare Data
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)Step 2: Scale Features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)Step 3: Initialize and Train the Classifier
sgd_clf = SGDClassifier(max_iter=1000, tol=1e-3, loss='log_loss', learning_rate='optimal', random_state=42)
sgd_clf.fit(X_train_scaled, y_train)Using loss='log' tells the classifier to perform logistic regression, which is suitable for binary classification.
Step 4: Evaluate Accuracy
from sklearn.metrics import accuracy_score
y_pred = sgd_clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")I executed the above example code and added the screenshot below.

Check outScikit-Learn Confusion Matrix
Understand Gradient Descent Parameters in Scikit-Learn
From my experience, tuning Gradient Descent parameters is key to success:
- learning_rate: Controls how much the model updates in each iteration. Options include
'constant','optimal','invscaling', and'adaptive'. - eta0: Initial learning rate. Start small (e.g., 0.01) and adjust based on convergence.
- max_iter: Number of passes over the training data. More iterations can improve accuracy but increase training time.
- tol: Tolerance for stopping criteria. Training stops when the improvement is less than this value.
Experimenting with these parameters on your dataset will help you find the sweet spot.
Bonus: Visualize the Gradient Descent Process
If you want to see Gradient Descent in action, you can track the loss function during training by using the warm_start=True parameter and manually iterating.
Here’s a quick example of regression:
import matplotlib.pyplot as plt
import numpy as np
sgd_reg = SGDRegressor(max_iter=1, tol=None, warm_start=True, learning_rate='invscaling', eta0=0.01, random_state=42)
n_epochs = 50
mse_list = []
for epoch in range(n_epochs):
sgd_reg.fit(X_train_scaled, y_train)
y_pred = sgd_reg.predict(X_train_scaled)
mse = mean_squared_error(y_train, y_pred)
mse_list.append(mse)
plt.plot(np.arange(n_epochs), mse_list)
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error')
plt.title('Gradient Descent Convergence')
plt.show()This helps you understand how the error decreases with each iteration.
When to Use Gradient Descent in Scikit-Learn
While Scikit-Learn’s default estimators like LinearRegression are efficient for many tasks, Gradient Descent shines when:
- You have large datasets where closed-form solutions are computationally expensive.
- You want to implement online learning or incremental updates.
- You want more control over the optimization process.
For example, in large-scale USA housing market predictions or financial risk assessments, SGDRegressor and SGDClassifier provide scalable solutions.
Gradient Descent is a fundamental tool in the data scientist’s toolkit. Using Scikit-Learn’s implementations, you can harness its power with minimal hassle.
If you’re working on projects involving large datasets or need fine control over optimization, I recommend experimenting with SGDRegressor and SGDClassifier. Remember to scale your features and tune parameters like learning rate and iterations for the best results.
Other articles you may also like:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.