Scikit-Learn Gradient Descent

As a Python developer with over a decade of experience, I’ve seen firsthand how essential Gradient Descent is in machine learning. Whether you’re tweaking linear regression models or diving into neural networks, understanding Gradient Descent can dramatically improve your model’s performance.

In this article, I’ll walk you through how to use Gradient Descent with Scikit-Learn, one of the most popular Python libraries for machine learning. I’ll share practical tips and code examples based on real-world scenarios, especially relevant to data projects common in the USA.

Let’s dive in!

This Tutorial Covers:

What is Gradient Descent?

Gradient Descent is an optimization algorithm used to minimize the cost function in machine learning models. Think of it as a way to find the lowest point in a landscape, the point where your model error is the smallest.

In simple terms, Gradient Descent iteratively adjusts model parameters to reduce prediction errors. It’s like tuning your car’s engine to get the best mileage; small adjustments lead to better performance.

How Scikit-Learn Uses Gradient Descent

Scikit-Learn abstracts many complexities of Gradient Descent. While it doesn’t expose Gradient Descent directly as a standalone function, many of its estimators use variants of Gradient Descent under the hood.

For example:

LinearRegression uses the Ordinary Least Squares method, which doesn’t use Gradient Descent but a closed-form solution.
SGDRegressor and SGDClassifier explicitly use Stochastic Gradient Descent.
LogisticRegression can use different solvers, including those based on Gradient Descent.

I’ll focus on how to work with Gradient Descent explicitly using SGDRegressor and SGDClassifier.

Method 1: Use SGDRegressor for Linear Regression with Gradient Descent

When you want to perform linear regression but prefer Gradient Descent over the closed-form solution, SGDRegressor is your go-to.

Step 1: Import Libraries and Prepare Data

Let’s consider a practical example: predicting house prices in California based on features like the number of bedrooms, square footage, and the age of the house.

from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Sample dataset (replace with actual California housing data)
data = pd.read_csv('california_housing.csv')

X = data[['bedrooms', 'sqft_living', 'age']]
y = data['price']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 2: Scale Features

Gradient Descent converges faster when features are scaled.

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Step 3: Initialize and Train the Model

sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, learning_rate='invscaling', eta0=0.01, random_state=42)
sgd_reg.fit(X_train_scaled, y_train)

Here, max_iter controls the number of iterations, learning_rate defines how the step size changes, and eta0 is the initial learning rate.

Step 4: Evaluate Performance

from sklearn.metrics import mean_squared_error

y_pred = sgd_reg.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

I executed the above example code and added the screenshot below.

Read Scikit-Learn Non-Linear

Method 2: Use SGDClassifier for Classification with Gradient Descent

Suppose you want to classify whether a loan application will be approved based on applicant data — a common use case in the USA financial sector.

Step 1: Prepare Data

from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 2: Scale Features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Step 3: Initialize and Train the Classifier

sgd_clf = SGDClassifier(max_iter=1000, tol=1e-3, loss='log_loss', learning_rate='optimal', random_state=42)
sgd_clf.fit(X_train_scaled, y_train)

Using loss='log' tells the classifier to perform logistic regression, which is suitable for binary classification.

Step 4: Evaluate Accuracy

from sklearn.metrics import accuracy_score

y_pred = sgd_clf.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

I executed the above example code and added the screenshot below.

Check outScikit-Learn Confusion Matrix

Understand Gradient Descent Parameters in Scikit-Learn

From my experience, tuning Gradient Descent parameters is key to success:

learning_rate: Controls how much the model updates in each iteration. Options include 'constant', 'optimal', 'invscaling', and 'adaptive'.
eta0: Initial learning rate. Start small (e.g., 0.01) and adjust based on convergence.
max_iter: Number of passes over the training data. More iterations can improve accuracy but increase training time.
tol: Tolerance for stopping criteria. Training stops when the improvement is less than this value.

Experimenting with these parameters on your dataset will help you find the sweet spot.

Bonus: Visualize the Gradient Descent Process

If you want to see Gradient Descent in action, you can track the loss function during training by using the warm_start=True parameter and manually iterating.

Here’s a quick example of regression:

import matplotlib.pyplot as plt
import numpy as np

sgd_reg = SGDRegressor(max_iter=1, tol=None, warm_start=True, learning_rate='invscaling', eta0=0.01, random_state=42)
n_epochs = 50
mse_list = []

for epoch in range(n_epochs):
    sgd_reg.fit(X_train_scaled, y_train)
    y_pred = sgd_reg.predict(X_train_scaled)
    mse = mean_squared_error(y_train, y_pred)
    mse_list.append(mse)

plt.plot(np.arange(n_epochs), mse_list)
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error')
plt.title('Gradient Descent Convergence')
plt.show()

This helps you understand how the error decreases with each iteration.

When to Use Gradient Descent in Scikit-Learn

While Scikit-Learn’s default estimators like LinearRegression are efficient for many tasks, Gradient Descent shines when:

You have large datasets where closed-form solutions are computationally expensive.
You want to implement online learning or incremental updates.
You want more control over the optimization process.

For example, in large-scale USA housing market predictions or financial risk assessments, SGDRegressor and SGDClassifier provide scalable solutions.

Gradient Descent is a fundamental tool in the data scientist’s toolkit. Using Scikit-Learn’s implementations, you can harness its power with minimal hassle.

If you’re working on projects involving large datasets or need fine control over optimization, I recommend experimenting with SGDRegressor and SGDClassifier. Remember to scale your features and tune parameters like learning rate and iterations for the best results.

Scikit-Learn Gradient Descent

What is Gradient Descent?

How Scikit-Learn Uses Gradient Descent

Method 1: Use SGDRegressor for Linear Regression with Gradient Descent

Step 1: Import Libraries and Prepare Data

Step 2: Scale Features

Step 3: Initialize and Train the Model

Step 4: Evaluate Performance

Method 2: Use SGDClassifier for Classification with Gradient Descent

Step 1: Prepare Data

Step 2: Scale Features

Step 3: Initialize and Train the Classifier

Step 4: Evaluate Accuracy

Understand Gradient Descent Parameters in Scikit-Learn

Bonus: Visualize the Gradient Descent Process

When to Use Gradient Descent in Scikit-Learn

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends