Scikit-learn Logistic Regression

As a Python developer with over a decade of experience, I’ve worked extensively with machine learning models. Among them, logistic regression remains one of the most useful yet simple algorithms for classification problems.

In this article, I’ll walk you through how to implement logistic regression using Scikit-learn, the go-to Python library for machine learning. I’ll share practical methods and tips based on real-world experience so you can quickly apply this in your projects.

Let’s get started..!

What is Logistic Regression?

Logistic regression is a classification algorithm used to predict binary outcomes, yes/no, true/false, or 0/1. Unlike linear regression, which predicts continuous values, logistic regression estimates the probability that a given input belongs to a particular class.

For example, suppose you want to predict whether a customer in the US will buy a product (1) or not (0) based on their age, income, and browsing history. Logistic regression can model this probability effectively.

Get Started with Logistic Regression in Scikit-learn

Let me show you how to create a logistic regression model step-by-step using a practical example. Imagine you have a dataset of US bank customers, and you want to predict whether they will subscribe to a term deposit based on their features.

Step 1: Import Libraries and Load Data

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset (replace with your actual data source)
data = pd.read_csv('us_bank_customers.csv')

# Preview data
print(data.head())

Step 2: Prepare Data

Select relevant features and the target variable. Clean the data by handling missing values and encoding categorical variables.

# Features and target
X = data[['age', 'balance', 'duration', 'campaign']]
y = data['subscribed']  # 1 if subscribed, 0 otherwise

# Split into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 3: Train the Logistic Regression Model

# Initialize the model
model = LogisticRegression(max_iter=1000)

# Train the model
model.fit(X_train, y_train)

Step 4: Make Predictions and Evaluate

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))

# Confusion matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Detailed classification report
print(classification_report(y_test, y_pred))

I executed the above example code and added the screenshot below.

how to import logistic regression from sklearn

Read Scikit-Learn accuracy_score

Different Ways to Use Logistic Regression in Scikit-learn

Now, I will explain to you the different ways to use logistic regression in Scikit-learn.

1. Regular Logistic Regression (Default)

The example above uses the default logistic regression with L2 regularization. This is suitable for most cases and helps prevent overfitting.

2. Logistic Regression with L1 Regularization (Feature Selection)

L1 regularization can shrink some coefficients to zero, effectively performing feature selection. This is useful when you have many features.

model_l1 = LogisticRegression(penalty='l1', solver='liblinear', max_iter=1000)
model_l1.fit(X_train, y_train)

3. Multiclass Logistic Regression

While logistic regression is often used for binary classification, Scikit-learn supports multiclass classification using strategies like “one-vs-rest.”

# For multiclass target variable
model_multi = LogisticRegression(multi_class='ovr', max_iter=1000)
model_multi.fit(X_train, y_train)

4. Use Logistic Regression with Pipeline and Scaling

In many cases, features require scaling for better model performance. Using Scikit-learn’s Pipeline simplifies preprocessing and modeling.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('logreg', LogisticRegression(max_iter=1000))
])

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print("Pipeline Accuracy:", accuracy_score(y_test, y_pred))

I executed the above example code and added the screenshot below.

sklearn logistic regression

Check out the Scikit-Learn Genetic Algorithm

Tips from My Experience

  • Feature Engineering Matters: Logistic regression assumes a linear relationship between features and the log odds of the outcome. Create meaningful features or apply transformations if needed.
  • Handle Imbalanced Data: In US datasets like fraud detection, the classes may be imbalanced. Consider using techniques like class weighting (class_weight='balanced') or resampling.
  • Tune Hyperparameters: Use GridSearchCV or RandomizedSearchCV to find the best regularization strength (C parameter).
  • Interpretability: Logistic regression provides coefficients that show the impact of each feature. This is valuable in industries like finance and healthcare where understanding the model is critical.

Logistic regression with Scikit-learn is a useful yet accessible tool for classification tasks. Whether you’re working on customer behavior in the US banking sector or predicting election outcomes, this method offers a solid foundation.

I encourage you to experiment with different regularization techniques and preprocessing steps to optimize your models. The examples here reflect practical workflows I’ve used repeatedly in production environments.

Other Python articles you may also like:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.