Scikit-Learn Accuracy_Score

As a Python developer with over a decade of experience, I’ve worked extensively with machine learning libraries, and sci-kit-learn has always been my go-to toolkit. One of the fundamental metrics I use to evaluate classification models is the accuracy_score function. It’s simple, intuitive, and yet powerful enough to give you a quick snapshot of how well your model performs.

In this article, I’ll walk you through everything you need to know about accuracy_score in scikit-learn.

Let’s get in!

Accuracy Score in Scikit-Learn

Accuracy is one of the easiest metrics to evaluate classification models. It tells you the proportion of correctly predicted labels out of the total predictions made.

The accuracy_score function from Sci-Kit Learn calculates this metric for you. It takes two arrays as input: the true labels and the predicted labels, and returns a float between 0 and 1, where 1 means perfect prediction.

From my experience, accuracy is a great starting point for understanding model performance, especially when your classes are balanced. However, it might not be sufficient when dealing with imbalanced datasets, which is common in fraud detection or medical diagnosis scenarios.

How to Use Accuracy_Score: Step-by-Step

Let me show you how I typically use accuracy_score in a Python project.

Step 1: Import Required Libraries

from sklearn.metrics import accuracy_score

Step 2: Prepare Your Data

Imagine you’re working on a customer churn prediction model for a telecom company in the USA. After training your classifier, you have the true labels and model predictions.

# True labels of customers (1 = churn, 0 = stayed)
y_true = [0, 1, 0, 1, 0, 1, 0, 0, 1, 1]

# Predicted labels from your model
y_pred = [0, 0, 0, 1, 0, 1, 1, 0, 1, 1]

Step 3: Calculate Accuracy

accuracy = accuracy_score(y_true, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

This will output:

Model Accuracy: 0.80

You can refer to the screenshot below to see the output.

accuracy score sklearn

This means your model correctly predicted 80% of the cases.

Different Ways to Use Accuracy_Score

Let me show you the different ways to use accuracy_score.

1. Basic Usage

The example above shows the basic usage where you pass the true and predicted labels.

2. Use normalize Parameter

By default, accuracy_score returns the fraction of correct predictions. But if you want the count of correct predictions instead, you can set normalize=False.

correct_predictions = accuracy_score(y_true, y_pred, normalize=False)
print(f"Number of correct predictions: {correct_predictions}")

Output:

Number of correct predictions: 8

You can refer to the screenshot below to see the output.

sklearn accuracy

This is useful when you want to know the raw count instead of the proportion.

Check out Scikit-Learn Gradient Descent

3. Handle Multiclass Classification

Accuracy works seamlessly for multiclass problems too. Suppose you’re classifying types of vehicles in a traffic dataset:

y_true = ['car', 'truck', 'car', 'bus', 'bus', 'truck']
y_pred = ['car', 'truck', 'bus', 'bus', 'bus', 'car']

accuracy = accuracy_score(y_true, y_pred)
print(f"Multiclass Model Accuracy: {accuracy:.2f}")

Output:

Multiclass Model Accuracy: 0.67

You can refer to the screenshot below to see the output.

accuracy_score

When Should You Use Accuracy_Score?

Accuracy is perfect when:

  • Your dataset has balanced classes (roughly equal samples per class).
  • You want a quick and easy performance metric.
  • You are dealing with multiclass or binary classification.

However, in cases like credit card fraud detection in the USA, where fraudulent transactions are rare, accuracy can be misleading. A model that predicts all transactions as non-fraudulent can have high accuracy but poor usefulness. In such cases, metrics like precision, recall, or F1-score might be better.

Read Scikit-Learn Genetic Algorithm

Real-World Example: Predict Loan Defaults

Let me share a quick example from a loan default prediction project I worked on. The dataset contained thousands of loan records from US banks, labeled as default or no default.

After training a logistic regression classifier, I used accuracy_score to evaluate the model:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Assume X and y are your features and labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Loan Default Prediction Accuracy: {accuracy:.2f}")

This gave me a quick sense of how well the model was doing, before diving deeper with other metrics.

Tips to Improve Accuracy Score

  1. Feature Engineering: Clean and meaningful features improve model predictions.
  2. Hyperparameter Tuning: Use grid search or random search to find the best model parameters.
  3. Balanced Dataset: Use techniques like SMOTE or class weighting if your dataset is imbalanced.
  4. Cross-Validation: Use cross-validation to get a more reliable estimate of accuracy.

The accuracy_score function from sci-kit-learn is a simple yet effective way to measure the performance of classification models. It’s easy to use, works for both binary and multiclass problems, and gives you a quick snapshot of your model’s correctness.

While accuracy is a great starting point, always consider the nature of your dataset and problem. For imbalanced datasets, complement accuracy with other metrics like precision, recall, or F1-score.

Other Python tutorials you may also like:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.