PyTorch nn.Sigmoid()

In my decade-plus experience with Python and deep learning frameworks, I’ve found that understanding activation functions is crucial for building effective neural networks.

Today, I want to share my knowledge about PyTorch’s nn.Sigmoid() function, a fundamental component that I use frequently in my machine learning projects.

The sigmoid function converts raw model outputs into probabilities ranging from 0 to 1, making it suitable for binary classification tasks such as spam detection, medical diagnoses, and predicting customer churn.

Let’s get into how you can implement and use this powerful function in your PyTorch projects.

Sigmoid Function in PyTorch

The sigmoid function (also known as the logistic function) is a mathematical function that maps any real-valued number to a value between 0 and 1. In PyTorch, it’s implemented as a neural network module through nn.Sigmoid().

I use this activation function primarily for:

  • Binary classification problems
  • Output layers that need probability values
  • Gates in certain recurrent neural networks

Here’s what the sigmoid function looks like mathematically:

σ(x) = 1 / (1 + e^(-x))

How to Use nn.Sigmoid() in PyTorch

Let’s start with the basic implementation of the sigmoid function in PyTorch:

import torch
import torch.nn as nn

# Create a sigmoid layer
sigmoid = nn.Sigmoid()

# Sample input
x = torch.randn(5)
print("Input:", x)

# Apply sigmoid
output = sigmoid(x)
print("Output after sigmoid:", output)

I executed the above example code and added the screenshot below.

pytorch sigmoid

When I run this code, I get outputs strictly between 0 and 1, regardless of how large or small the input values are.

Read PyTorch TanH

Create a Binary Classifier with nn.Sigmoid()

Now, let’s build something more practical, a binary classifier for a real-world scenario. Imagine we’re creating a model to predict whether a customer will churn based on various features:

import torch
import torch.nn as nn
import torch.optim as optim

# Simple binary classification model
class ChurnPredictor(nn.Module):
    def __init__(self, input_features):
        super(ChurnPredictor, self).__init__()
        self.linear = nn.Linear(input_features, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.linear(x)
        x = self.sigmoid(x)
        return x

# Example data: features might include usage stats, customer age, billing info, etc.
input_features = 10
batch_size = 64

# Create random training data (in a real scenario, you'd use actual customer data)
X = torch.randn(batch_size, input_features)  # Customer features
y = torch.randint(0, 2, (batch_size, 1)).float()  # Churn label (0 or 1)

# Initialize the model
model = ChurnPredictor(input_features)

# Define loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross Entropy Loss works well with sigmoid
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop (simplified)
for epoch in range(100):
    # Forward pass
    predictions = model(X)
    loss = criterion(predictions, y)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

# Make predictions on new data
new_customer_data = torch.randn(5, input_features)
with torch.no_grad():
    predictions = model(new_customer_data)
    print("Churn Probabilities:", predictions.numpy())

I executed the above example code and added the screenshot below.

torch sigmoid

In this example, I’m using nn.Sigmoid() as the final layer to convert the model’s raw output into probability values representing the likelihood of customer churn.

Check out PyTorch Softmax

Sigmoid vs. Softmax: When to Use Each

I often get asked about the difference between sigmoid and softmax activations. Here’s when I use each:

# Binary classification (one output)
binary_model = nn.Sequential(
    nn.Linear(10, 1),
    nn.Sigmoid()  # For binary classification
)

# Multi-class classification (multiple outputs)
multi_class_model = nn.Sequential(
    nn.Linear(10, 5),
    nn.Softmax(dim=1)  # For multi-class (5 classes in this example)
)

I use sigmoid for binary outcomes (yes/no, true/false) and softmax when I need to classify inputs into multiple exclusive categories.

Read PyTorch Resize Images

Use Sigmoid in a Practical U.S. Healthcare Example

Let’s create a more specific U.S.-focused example, predicting diabetes risk based on patient data:

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Features: age, BMI, blood pressure, glucose level, etc.
input_features = 8

class DiabetesPredictor(nn.Module):
    def __init__(self):
        super(DiabetesPredictor, self).__init__()
        self.layer1 = nn.Linear(input_features, 12)
        self.layer2 = nn.Linear(12, 8)
        self.layer3 = nn.Linear(8, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.relu(self.layer2(x))
        x = self.layer3(x)
        x = self.sigmoid(x)  # Convert to probability
        return x

# Example patient data (normalized)
# In reality, you'd use actual patient records from healthcare datasets
def generate_sample_data(n_samples=100):
    # Generate synthetic patient data
    np.random.seed(42)

    # Age (normalized to 0-1 range)
    age = np.random.normal(0.5, 0.15, n_samples).reshape(-1, 1)

    # BMI (normalized)
    bmi = np.random.normal(0.45, 0.2, n_samples).reshape(-1, 1)

    # Blood pressure (normalized)
    bp = np.random.normal(0.5, 0.15, n_samples).reshape(-1, 1)

    # Glucose level (normalized)
    glucose = np.random.normal(0.45, 0.17, n_samples).reshape(-1, 1)

    # Other features
    other_features = np.random.normal(0.5, 0.15, (n_samples, 4))

    # Combine all features
    X = np.hstack([age, bmi, bp, glucose, other_features])

    # Generate labels (diabetes: yes=1, no=0)
    # Higher chance of diabetes with higher glucose and BMI
    p_diabetes = 0.3 + 0.4 * glucose.flatten() + 0.3 * bmi.flatten()
    p_diabetes = np.clip(p_diabetes, 0.1, 0.9)
    y = np.random.binomial(1, p_diabetes).reshape(-1, 1)

    return torch.FloatTensor(X), torch.FloatTensor(y)

# Generate data
X, y = generate_sample_data(500)

# Split into train and test
train_size = int(0.8 * len(X))
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]

# Initialize model, loss, and optimizer
model = DiabetesPredictor()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training
epochs = 200
batch_size = 32
n_batches = len(X_train) // batch_size

for epoch in range(epochs):
    for i in range(n_batches):
        start_idx = i * batch_size
        end_idx = start_idx + batch_size

        # Get batch
        X_batch = X_train[start_idx:end_idx]
        y_batch = y_train[start_idx:end_idx]

        # Forward pass
        predictions = model(X_batch)
        loss = criterion(predictions, y_batch)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if epoch % 20 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

# Evaluate on test data
with torch.no_grad():
    test_predictions = model(X_test)
    test_predictions_binary = (test_predictions > 0.5).float()
    accuracy = (test_predictions_binary == y_test).float().mean()
    print(f"Test Accuracy: {accuracy:.4f}")

# Example: Predict diabetes risk for new patients
new_patients = torch.FloatTensor([
    [0.62, 0.75, 0.58, 0.82, 0.45, 0.52, 0.37, 0.48],  # High-risk patient
    [0.45, 0.42, 0.39, 0.44, 0.52, 0.41, 0.48, 0.47]   # Low-risk patient
])

with torch.no_grad():
    risk_predictions = model(new_patients)
    print("Diabetes Risk Probabilities:")
    for i, risk in enumerate(risk_predictions):
        print(f"Patient {i+1}: {risk.item():.4f} ({risk.item()*100:.1f}%)")

I executed the above example code and added the screenshot below.

torch.sigmoid

In this example, I’ve created a neural network that uses sigmoid activation in the output layer to predict diabetes risk. The model provides probability scores that healthcare providers could use to identify high-risk patients who might need preventive interventions.

Performance Considerations with nn.Sigmoid()

When I work with sigmoid in production systems, I keep these performance considerations in mind:

  1. Vanishing Gradient Issue: For very large or small inputs, the gradient of the sigmoid function becomes extremely small, which can slow down learning.
  2. Computational Efficiency: Computing exponentials in sigmoid is relatively expensive. For hidden layers, I often use ReLU instead.
  3. Numerical Stability: Using sigmoid with certain loss functions can lead to numerical instability. For example, when using sigmoid with binary cross-entropy loss, I use PyTorch’s combined nn.BCEWithLogitsLoss() for better stability:
# More numerically stable approach
class ImprovedModel(nn.Module):
    def __init__(self, input_size):
        super(ImprovedModel, self).__init__()
        self.linear = nn.Linear(input_size, 1)
        # No sigmoid here

    def forward(self, x):
        return self.linear(x)  # Return logits

# Use BCEWithLogitsLoss instead
model = ImprovedModel(10)
criterion = nn.BCEWithLogitsLoss()  # Combines sigmoid and BCE

This combined function is more numerically stable and efficient than using separate sigmoid and BCE loss functions.

When to Use (and Not Use) nn.Sigmoid()

Based on my experience, here are situations where I choose to use or avoid sigmoid:

Use sigmoid when:

  • You need a binary classifier (output layer)
  • You’re implementing certain gating mechanisms in RNNs
  • You need to model probability values between 0 and 1

Consider alternatives when:

  • Working with deep networks (use ReLU for hidden layers)
  • Dealing with multi-class classification problems (use softmax instead)
  • Experiencing vanishing gradient issues (consider tanh or ReLU variants)
  • Performance is critical (BCEWithLogitsLoss is faster than separate sigmoid + BCE)

Implement Sigmoid in Custom Loss Functions

Sometimes I need to incorporate the sigmoid function into custom loss implementations. Here’s how I usually do it:

import torch
import torch.nn as nn
import torch.nn.functional as F

class CustomSigmoidFocalLoss(nn.Module):
    """A custom loss function for imbalanced datasets that uses sigmoid internally"""
    
    def __init__(self, alpha=0.25, gamma=2.0):
        super(CustomSigmoidFocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        
    def forward(self, inputs, targets):
        # Apply sigmoid to get probabilities
        probs = torch.sigmoid(inputs)
        
        # Calculate BCE loss
        bce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
        
        # Calculate focal term
        p_t = probs * targets + (1 - probs) * (1 - targets)
        focal_term = (1 - p_t) ** self.gamma
        
        # Apply alpha weighting
        alpha_weight = self.alpha * targets + (1 - self.alpha) * (1 - targets)
        
        # Combine all terms
        focal_loss = alpha_weight * focal_term * bce_loss
        
        return focal_loss.mean()

# Usage example
model = nn.Linear(10, 1)  # Simple model that outputs logits
criterion = CustomSigmoidFocalLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Example data
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100, 1)).float()

# Training loop
for epoch in range(10):
    # Forward pass
    logits = model(X)
    loss = criterion(logits, y)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

This custom loss function is particularly useful for imbalanced datasets (like fraud detection or rare disease diagnosis), where positive examples are much less common than negative ones.

Read Use PyTorch Cat function

Visualize Sigmoid Activation in Neural Networks

To better understand how sigmoid transforms values, I often create visualizations like this:

import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn

# Create a range of input values
x = torch.linspace(-10, 10, 1000)

# Apply sigmoid
sigmoid = nn.Sigmoid()
y = sigmoid(x)

# Convert to numpy for plotting
x_np = x.numpy()
y_np = y.numpy()

# Plot
plt.figure(figsize=(10, 6))
plt.plot(x_np, y_np)
plt.grid(True)
plt.title('Sigmoid Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.axhline(y=0.5, color='r', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='r', linestyle='--', alpha=0.3)
plt.text(1, 0.45, 'y=0.5 at x=0', fontsize=12)
plt.text(-9, 0.9, 'Output approaches 1 for large positive inputs', fontsize=12)
plt.text(-9, 0.1, 'Output approaches 0 for large negative inputs', fontsize=12)

# Show output value at specific inputs
for input_val in [-5, -2, 0, 2, 5]:
    output_val = sigmoid(torch.tensor([float(input_val)])).item()
    plt.plot(input_val, output_val, 'ro')
    plt.text(input_val+0.1, output_val, f'({input_val}, {output_val:.4f})', fontsize=9)

plt.tight_layout()
plt.show()

This visualization shows how sigmoid maps any input value to an output between 0 and 1, with the steepest change happening around x=0.

Check out PyTorch Stack Tutorial

Integrate Sigmoid with Other PyTorch Components

In real-world applications, I often combine sigmoid with other PyTorch modules. Here’s a fraud detection model that demonstrates this integration:

import torch
import torch.nn as nn

class FraudDetectionModel(nn.Module):
    def __init__(self, input_features=20):
        super(FraudDetectionModel, self).__init__()
        
        # Feature extraction layers
        self.feature_extractor = nn.Sequential(
            nn.Linear(input_features, 64),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Dropout(0.3),
            
            nn.Linear(64, 32),
            nn.BatchNorm1d(32),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        
        # Classification head
        self.classifier = nn.Sequential(
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()  # Final activation for binary output
        )
        
    def forward(self, x):
        features = self.feature_extractor(x)
        return self.classifier(features)

In this model, I use a sigmoid in the final layer to get fraud probability scores between 0 and 1, which helps financial institutions set appropriate risk thresholds.

Read Create PyTorch Empty Tensor

Work with Sigmoid in Transfer Learning

When using pre-trained models, I often need to modify the final layer to include a sigmoid activation for binary classification tasks:

import torch
import torch.nn as nn
import torchvision.models as models

# Load a pre-trained model
resnet = models.resnet50(pretrained=True)

# Freeze parameters
for param in resnet.parameters():
    param.requires_grad = False
    
# Modify the final layer for binary classification
num_features = resnet.fc.in_features
resnet.fc = nn.Sequential(
    nn.Linear(num_features, 1),
    nn.Sigmoid()
)

# Now the model outputs probabilities between 0 and 1
sample_input = torch.randn(1, 3, 224, 224)  # Sample image
output = resnet(sample_input)
print(f"Probability: {output.item():.4f}")

This approach allows me to adapt powerful pre-trained models for specific binary classification tasks like detecting melanoma in skin images or identifying structural damage in building photos.

Working with PyTorch’s nn.Sigmoid() has become second nature to me over the years. While newer activation functions may offer advantages in certain scenarios, sigmoid remains essential for any task requiring probability outputs.

Remember that sigmoid works best in output layers for binary classification tasks, while ReLU and its variants often perform better in hidden layers. If you’re working with multi-class problems, softmax is typically a better choice than sigmoid.

You may like to read other PyTorch-related tutorials:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.