PyTorch Binary Cross Entropy

When working on binary classification problems in deep learning, choosing the right loss function is crucial. Recently, I was building a sentiment analysis model that needed to classify text as either positive or negative, and PyTorch’s Binary Cross Entropy (BCE) loss function proved to be exactly what I needed.

Binary Cross Entropy is a widely used loss function for binary classification tasks in PyTorch. It evaluates the performance of a classification model whose output is a probability value ranging from 0 to 1.

In this guide, I will walk you through everything you need to know about PyTorch’s Binary Cross Entropy loss function, complete with practical examples and implementations.

Binary Cross-Entropy

Binary Cross Entropy measures the difference between two probability distributions: the true distribution (your target values, which are either 0 or 1) and the predicted distribution (your model’s output, which are probability values between 0 and 1).

The formula for Binary Cross Entropy is:

BCE = -[y * log(p) + (1 - y) * log(1 - p)]

Where:

  • y is the true label (0 or 1)
  • p is the predicted probability

The loss increases as the predicted probability diverges from the actual label, making it a perfect metric for training binary classifiers.

PyTorch Implementation of Binary Cross-Entropy

PyTorch offers two main implementations of Binary Cross-Entropy:

  1. nn.BCELoss() – Standard Binary Cross Entropy
  2. nn.BCEWithLogitsLoss() – Binary Cross Entropy with Logits

Let’s explore both of these methods.

Read PyTorch Model Summary

Method 1: Use nn.BCELoss()

This is the standard implementation that expects your model to output probabilities (values between 0 and 1). You need to apply a sigmoid activation function to your model’s output before using this loss function.

Here’s a simple example:

import torch
import torch.nn as nn

# Define a simple binary classification model
class BinaryClassifier(nn.Module):
    def __init__(self, input_size):
        super(BinaryClassifier, self).__init__()
        self.linear = nn.Linear(input_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        output = self.linear(x)
        output = self.sigmoid(output)  # Apply sigmoid to get probabilities
        return output

# Create some dummy data
input_size = 10
batch_size = 32
X = torch.randn(batch_size, input_size)
y = torch.randint(0, 2, (batch_size, 1)).float()  # Binary targets (0 or 1)

# Initialize the model and loss function
model = BinaryClassifier(input_size)
criterion = nn.BCELoss()

# Forward pass
outputs = model(X)
loss = criterion(outputs, y)

print(f"Loss value: {loss.item()}")

Output:

Loss value: 0.7363111972808838

I executed the above example code and added the screenshot below.

binary cross entropy

It’s important to note that your target values must be floats, not integers, when using BCELoss.

Method 2: Use nn.BCEWithLogitsLoss()

PyTorch Binary cross entropy with logits combines a sigmoid activation and the binary cross entropy loss in one single class. This is more numerically stable than using a separate sigmoid followed by BCELoss.

This approach is usually preferred because:

  1. It’s more numerically stable
  2. It’s more computationally efficient
  3. It allows for the use of class weights

Here’s how to implement it:

import torch
import torch.nn as nn

# Define a model that outputs logits (not probabilities)
class BinaryClassifierWithLogits(nn.Module):
    def __init__(self, input_size):
        super(BinaryClassifierWithLogits, self).__init__()
        self.linear = nn.Linear(input_size, 1)

    def forward(self, x):
        output = self.linear(x)  # No sigmoid here
        return output

# Create some dummy data
input_size = 10
batch_size = 32
X = torch.randn(batch_size, input_size)
y = torch.randint(0, 2, (batch_size, 1)).float()  # Binary targets (0 or 1)

# Initialize the model and loss function
model = BinaryClassifierWithLogits(input_size)
criterion = nn.BCEWithLogitsLoss()

# Forward pass
outputs = model(X)  # These are logits, not probabilities
loss = criterion(outputs, y)

print(f"Loss value: {loss.item()}")

Output:

Loss value: 0.7850856781005859

I executed the above example code and added the screenshot below.

pytorch binary cross entropy

Check out PyTorch MNIST Tutorial

Add Class Weights to Handle Imbalanced Data

One common challenge in binary classification is dealing with imbalanced datasets. For example, when I was working on a fraud detection model, only 2% of transactions were fraudulent. This imbalance can bias the model toward the majority class.

Fortunately, BCEWithLogitsLoss allows you to specify weights for each class:

import torch
import torch.nn as nn

# Define a simple binary classification model (outputs logits)
class BinaryClassifierWithLogits(nn.Module):
    def __init__(self, input_size):
        super(BinaryClassifierWithLogits, self).__init__()
        self.linear = nn.Linear(input_size, 1)

    def forward(self, x):
        output = self.linear(x)
        return output

# Create imbalanced dummy data (80% class 0, 20% class 1)
input_size = 10
batch_size = 100
X = torch.randn(batch_size, input_size)
y = torch.zeros(batch_size, 1)
y[:20] = 1  # Only 20% of samples are class 1
y = y.float()

# Calculate class weights (inversely proportional to class frequencies)
num_pos = y.sum().item()
num_neg = len(y) - num_pos
pos_weight = torch.tensor([num_neg / num_pos])  # Weight for the positive class

# Initialize the model and weighted loss function
model = BinaryClassifierWithLogits(input_size)
criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight)

# Forward pass
outputs = model(X)
loss = criterion(outputs, y)

print(f"Loss value with class weighting: {loss.item()}")

Output:

Loss value with class weighting: 1.0177892446517944

I executed the above example code and added the screenshot below.

binary cross entropy pytorch

Read PyTorch Fully Connected Layer

Real-World Example: Sentiment Analysis

Let’s implement a more practical example. We’ll build a simple sentiment analysis model using PyTorch’s Binary Cross Entropy to classify movie reviews as positive or negative.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

# Sample movie reviews data
reviews = [
    "This movie was fantastic! I really enjoyed it.",
    "Great performances by all the actors.",
    "The plot was predictable but entertaining.",
    "I was disappointed with the ending.",
    "Worst movie I've seen in years.",
    "Complete waste of money, terrible acting.",
    "The special effects were amazing!",
    "I fell asleep halfway through."
]

# Labels: 1 for positive, 0 for negative
labels = [1, 1, 1, 0, 0, 0, 1, 0]

# Convert text to numerical features
vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(reviews).toarray()
y = np.array(labels).reshape(-1, 1)

# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y)

# Define the model
class SentimentClassifier(nn.Module):
    def __init__(self, input_size):
        super(SentimentClassifier, self).__init__()
        self.fc1 = nn.Linear(input_size, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model, loss function, and optimizer
input_size = X_tensor.shape[1]
model = SentimentClassifier(input_size)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 1000
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_tensor)
    loss = criterion(outputs, y_tensor)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Test the model
with torch.no_grad():
    model.eval()
    test_outputs = model(X_tensor)
    predicted = (torch.sigmoid(test_outputs) >= 0.5).float()
    accuracy = (predicted == y_tensor).float().mean()
    print(f'Accuracy: {accuracy.item():.4f}')

# Test with a new review
new_review = ["The movie had great visuals but a boring storyline"]
new_X = torch.FloatTensor(vectorizer.transform(new_review).toarray())
with torch.no_grad():
    new_output = model(new_X)
    probability = torch.sigmoid(new_output).item()
    prediction = "Positive" if probability >= 0.5 else "Negative"
    print(f'Review: "{new_review[0]}"')
    print(f'Sentiment: {prediction} (Probability: {probability:.4f})')

When to Use Binary Cross-Entropy

Binary Cross Entropy is ideal for:

  1. Binary classification problems (e.g., spam detection, sentiment analysis)
  2. Models that output a single probability value between 0 and 1
  3. Problems where you need to distinguish between two classes

If your problem involves multiple classes (more than two), you should consider using Cross-Entropy Loss instead.

In practice, I almost always use BCEWithLogitsLoss rather than the standard BCELoss because of its improved numerical stability and additional features.

I hope this tutorial has given you a clear understanding of PyTorch’s Binary Cross Entropy loss function and how to implement it in your projects. This useful loss function is an essential tool for binary classification tasks, and understanding it well will help you build more effective deep learning models.

You may read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.