As a Python developer with over a decade of experience, I’ve worked extensively with machine learning models. Among them, logistic regression remains one of the most useful yet simple algorithms for classification problems.
In this article, I’ll walk you through how to implement logistic regression using Scikit-learn, the go-to Python library for machine learning. I’ll share practical methods and tips based on real-world experience so you can quickly apply this in your projects.
Let’s get started..!
What is Logistic Regression?
Logistic regression is a classification algorithm used to predict binary outcomes, yes/no, true/false, or 0/1. Unlike linear regression, which predicts continuous values, logistic regression estimates the probability that a given input belongs to a particular class.
For example, suppose you want to predict whether a customer in the US will buy a product (1) or not (0) based on their age, income, and browsing history. Logistic regression can model this probability effectively.
Get Started with Logistic Regression in Scikit-learn
Let me show you how to create a logistic regression model step-by-step using a practical example. Imagine you have a dataset of US bank customers, and you want to predict whether they will subscribe to a term deposit based on their features.
Step 1: Import Libraries and Load Data
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Load dataset (replace with your actual data source)
data = pd.read_csv('us_bank_customers.csv')
# Preview data
print(data.head())Step 2: Prepare Data
Select relevant features and the target variable. Clean the data by handling missing values and encoding categorical variables.
# Features and target
X = data[['age', 'balance', 'duration', 'campaign']]
y = data['subscribed'] # 1 if subscribed, 0 otherwise
# Split into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)Step 3: Train the Logistic Regression Model
# Initialize the model
model = LogisticRegression(max_iter=1000)
# Train the model
model.fit(X_train, y_train)Step 4: Make Predictions and Evaluate
# Predict on test data
y_pred = model.predict(X_test)
# Evaluate accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))
# Confusion matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
# Detailed classification report
print(classification_report(y_test, y_pred))I executed the above example code and added the screenshot below.

Read Scikit-Learn accuracy_score
Different Ways to Use Logistic Regression in Scikit-learn
Now, I will explain to you the different ways to use logistic regression in Scikit-learn.
1. Regular Logistic Regression (Default)
The example above uses the default logistic regression with L2 regularization. This is suitable for most cases and helps prevent overfitting.
2. Logistic Regression with L1 Regularization (Feature Selection)
L1 regularization can shrink some coefficients to zero, effectively performing feature selection. This is useful when you have many features.
model_l1 = LogisticRegression(penalty='l1', solver='liblinear', max_iter=1000)
model_l1.fit(X_train, y_train)3. Multiclass Logistic Regression
While logistic regression is often used for binary classification, Scikit-learn supports multiclass classification using strategies like “one-vs-rest.”
# For multiclass target variable
model_multi = LogisticRegression(multi_class='ovr', max_iter=1000)
model_multi.fit(X_train, y_train)4. Use Logistic Regression with Pipeline and Scaling
In many cases, features require scaling for better model performance. Using Scikit-learn’s Pipeline simplifies preprocessing and modeling.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline([
('scaler', StandardScaler()),
('logreg', LogisticRegression(max_iter=1000))
])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print("Pipeline Accuracy:", accuracy_score(y_test, y_pred))I executed the above example code and added the screenshot below.

Check out the Scikit-Learn Genetic Algorithm
Tips from My Experience
- Feature Engineering Matters: Logistic regression assumes a linear relationship between features and the log odds of the outcome. Create meaningful features or apply transformations if needed.
- Handle Imbalanced Data: In US datasets like fraud detection, the classes may be imbalanced. Consider using techniques like class weighting (
class_weight='balanced') or resampling. - Tune Hyperparameters: Use GridSearchCV or RandomizedSearchCV to find the best regularization strength (
Cparameter). - Interpretability: Logistic regression provides coefficients that show the impact of each feature. This is valuable in industries like finance and healthcare where understanding the model is critical.
Logistic regression with Scikit-learn is a useful yet accessible tool for classification tasks. Whether you’re working on customer behavior in the US banking sector or predicting election outcomes, this method offers a solid foundation.
I encourage you to experiment with different regularization techniques and preprocessing steps to optimize your models. The examples here reflect practical workflows I’ve used repeatedly in production environments.
Other Python articles you may also like:
- Scikit-Learn Gradient Descent
- Scikit-Learn Non-Linear
- Scikit-Learn Confusion Matrix
- 51 Scikit Learn Interview Questions And Answers

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.