Recently, I was working on a deep learning project that required training a neural network for regression tasks. The issue is that choosing the right loss function is crucial for model performance.
In this tutorial, I will cover everything you need to know about PyTorch’s MSELoss function, from basic implementation to advanced techniques.
So let’s get in!
MSELoss in PyTorch
MSELoss (Mean Squared Error Loss) is one of the most commonly used loss functions for regression problems in PyTorch. It measures the average squared difference between the estimated values and the actual values.
The formula is quite simple: MSE = (1/n) * Σ(y_pred – y_actual)²
In PyTorch, this function is implemented as torch.nn.MSELoss making it incredibly easy to use in your neural network models.
Basic Implementation of MSELoss
Let me show you how to implement MSELoss in PyTorch with a simple example:
import torch
import torch.nn as nn
# Create input tensors
predictions = torch.tensor([0.5, 1.8, 2.2, 3.1])
targets = torch.tensor([1.0, 2.0, 2.0, 3.0])
# Initialize the MSE loss function
criterion = nn.MSELoss()
# Calculate loss
loss = criterion(predictions, targets)
print(f"MSE Loss: {loss.item()}")Output:
MSE Loss: 0.08500000834465027I executed the above example code and added the screenshot below.

When you run this code, you’ll get the MSE value between the predictions and targets. It’s that simple!
MSELoss With Reduction Options
One of the useful features of PyTorch’s MSELoss is the ability to control how the loss is reduced across observations. There are three reduction options:
1. ‘mean’ (Default)
criterion = nn.MSELoss(reduction='mean')This calculates the average of all squared differences.
2. ‘sum’
criterion = nn.MSELoss(reduction='sum')This sums up all the squared differences without averaging.
3. ‘none’
criterion = nn.MSELoss(reduction='none')This doesn’t perform any reduction, returning a loss value for each element in the input.
Here’s how to use these options:
# Create tensors
predictions = torch.tensor([0.5, 1.8, 2.2, 3.1])
targets = torch.tensor([1.0, 2.0, 2.0, 3.0])
# Calculate MSE with different reduction methods
mse_mean = nn.MSELoss(reduction='mean')(predictions, targets)
mse_sum = nn.MSELoss(reduction='sum')(predictions, targets)
mse_none = nn.MSELoss(reduction='none')(predictions, targets)
print(f"Mean reduction: {mse_mean.item()}")
print(f"Sum reduction: {mse_sum.item()}")
print(f"No reduction: {mse_none}")Output:
Mean reduction: 0.08500000834465027
Sum reduction: 0.3400000333786011
No reduction: tensor([0.2500, 0.0400, 0.0400, 0.0100])I executed the above example code and added the screenshot below.

Check out PyTorch Model Eval
MSELoss In Neural Network Training
Now, let’s see how MSELoss fits into a complete neural network training loop. I’ll create a simple model to predict house prices based on square footage:
import torch
import torch.nn as nn
import torch.optim as optim
# Sample data (square footage and house prices in thousands)
X = torch.tensor([[1000], [1500], [2000], [2500], [3000]], dtype=torch.float32)
y = torch.tensor([[200], [300], [400], [500], [600]], dtype=torch.float32)
# Simple linear model
class HousePriceModel(nn.Module):
def __init__(self):
super(HousePriceModel, self).__init__()
self.linear = nn.Linear(1, 1)
def forward(self, x):
return self.linear(x)
# Initialize model, loss and optimizer
model = HousePriceModel()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001)
# Training loop
epochs = 100
for epoch in range(epochs):
# Forward pass
outputs = model(X)
loss = criterion(outputs, y)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
# Test the model
with torch.no_grad():
predicted = model(torch.tensor([[2200]], dtype=torch.float32))
print(f"Predicted price for a 2200 sq ft house: ${predicted.item()*1000:.2f}")Output:
Epoch [10/100], Loss: inf
Epoch [20/100], Loss: nan
Epoch [30/100], Loss: nan
Epoch [40/100], Loss: nan
Epoch [50/100], Loss: nan
Epoch [60/100], Loss: nan
Epoch [70/100], Loss: nan
Epoch [80/100], Loss: nan
Epoch [90/100], Loss: nan
Epoch [100/100], Loss: nan
Predicted price for a 2200 sq ft house: $nanI executed the above example code and added the screenshot below.

This code trains a simple linear model to predict house prices using MSELoss to quantify prediction errors.
Weighted MSELoss Implementation
Sometimes, not all errors are created equal. You might want to penalize errors on certain data points more heavily. PyTorch doesn’t have a built-in weighted MSE, but we can easily implement it:
def weighted_mse_loss(inputs, targets, weights):
loss = (inputs - targets) ** 2
return (loss * weights).mean()
# Example usage
predictions = torch.tensor([0.5, 1.8, 2.2, 3.1])
targets = torch.tensor([1.0, 2.0, 2.0, 3.0])
weights = torch.tensor([1.0, 2.0, 0.5, 0.8]) # Higher weight = more important
loss = weighted_mse_loss(predictions, targets, weights)
print(f"Weighted MSE Loss: {loss.item()}")MSELoss vs. Other Loss Functions
MSELoss is just one of several loss functions available in PyTorch. Let’s compare it with other common options:
Read PyTorch Dataloader
MSELoss vs. L1Loss (Mean Absolute Error)
predictions = torch.tensor([0.5, 1.8, 2.2, 3.1])
targets = torch.tensor([1.0, 2.0, 2.0, 3.0])
mse_loss = nn.MSELoss()(predictions, targets)
l1_loss = nn.L1Loss()(predictions, targets)
print(f"MSE Loss: {mse_loss.item()}")
print(f"L1 Loss: {l1_loss.item()}")MSELoss penalizes larger errors more severely than L1Loss, making it more sensitive to outliers. However, this can be beneficial when outliers represent important cases that need attention.
MSELoss vs. SmoothL1Loss
predictions = torch.tensor([0.5, 1.8, 2.2, 3.1])
targets = torch.tensor([1.0, 2.0, 2.0, 3.0])
mse_loss = nn.MSELoss()(predictions, targets)
smooth_l1 = nn.SmoothL1Loss()(predictions, targets)
print(f"MSE Loss: {mse_loss.item()}")
print(f"Smooth L1 Loss: {smooth_l1.item()}")SmoothL1Loss combines the best of both worlds – it behaves like L1Loss for large errors (reducing the impact of outliers) and like MSELoss for small errors (providing more stable gradients near zero).
Check out PyTorch Binary Cross Entropy
Handle Common MSELoss Issues
Now, I will explain how to handle common MSELoss issues.
1. Deal with NaN Values
def safe_mse_loss(predictions, targets):
# Replace NaN with zeros in the predictions
mask = torch.isnan(predictions)
predictions = torch.where(mask, torch.zeros_like(predictions), predictions)
# Create a mask for valid pairs
valid_mask = ~(torch.isnan(targets) | mask)
# Calculate MSE only on valid pairs
squared_diff = (predictions[valid_mask] - targets[valid_mask]) ** 2
return squared_diff.mean() if squared_diff.numel() > 0 else torch.tensor(0.0)2. MSELoss with Different Scales
When your target values span different scales, it’s often beneficial to normalize them before calculating MSE:
# Standard scaling approach
def scaled_mse_loss(predictions, targets, mean, std):
scaled_pred = (predictions - mean) / std
scaled_targets = (targets - mean) / std
return nn.MSELoss()(scaled_pred, scaled_targets)Advanced MSELoss Applications
Let me show you the advanced applications of MSEloss.
Use MSELoss for Image Reconstruction
MSELoss is commonly used in autoencoders for image reconstruction:
import torch.nn.functional as F
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
# Encoder
self.encoder = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 12)
)
# Decoder
self.decoder = nn.Sequential(
nn.Linear(12, 64),
nn.ReLU(),
nn.Linear(64, 128),
nn.ReLU(),
nn.Linear(128, 784),
nn.Sigmoid()
)
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
# Training loop would use MSELoss to measure reconstruction error
criterion = nn.MSELoss()Custom Loss Function Based on MSELoss
You can create custom loss functions that build on MSELoss:
class CustomMSELoss(nn.Module):
def __init__(self, regularization_factor=0.1):
super(CustomMSELoss, self).__init__()
self.mse = nn.MSELoss()
self.reg_factor = regularization_factor
def forward(self, predictions, targets, model):
# Standard MSE
mse_loss = self.mse(predictions, targets)
# Add L2 regularization
l2_reg = torch.tensor(0.0)
for param in model.parameters():
l2_reg += torch.norm(param, 2)
return mse_loss + self.reg_factor * l2_regI hope you found this article helpful in understanding PyTorch’s MSELoss function. From basic usage to advanced techniques, MSELoss is a versatile tool in your deep learning toolkit for regression problems. If you have any questions or suggestions, please leave them in the comments below.
Other Python articles you may also like:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.