Recently, I worked on a deep learning project that involved implementing a neural network for image classification. One of the fundamental components I needed was the linear layer in PyTorch. This module is essential for creating fully connected layers in neural networks, but many beginners find it challenging to implement correctly.
In this guide, I’ll cover everything you need to know about PyTorch’s nn.Linear module, from basic implementation to advanced techniques.
So let’s get in!
nn.Linear in PyTorch
The nn.Linear module in PyTorch implements a linear transformation of the form:
y = xA^T + bWhere x is the input tensor, A is the weight matrix, and b is the bias vector. This is essentially the same as the equation for a straight line (y = mx + b) but extended to multiple dimensions.
In neural networks, linear layers are used to transform input features into a different dimensional space, which can then be passed through activation functions to introduce non-linearity.
Implement nn.Linear: Basic Usage
Let’s start with the simplest implementation of a linear layer in PyTorch:
import torch
import torch.nn as nn
# Create a linear layer with 5 input features and 3 output features
linear_layer = nn.Linear(in_features=5, out_features=3)
# Create a sample input tensor
input_tensor = torch.randn(10, 5) # Batch size of 10, 5 features per sample
# Pass the input through the linear layer
output = linear_layer(input_tensor)
print(f"Input shape: {input_tensor.shape}")
print(f"Output shape: {output.shape}")Output:
Input shape: torch.Size([10, 5])
Output shape: torch.Size([10, 3])I executed the above example code and added the screenshot below.

When you run this code, you’ll see that the input tensor of shape [10, 5] is transformed into an output tensor of shape [10, 3]. Each sample in the batch is independently transformed from 5 features to 3 features.
Read PyTorch nn Sigmoid Tutorial
Understand Linear Layer Parameters
When you create an nn.Linear layer, PyTorch automatically initializes the weights and biases for you. Let’s examine these parameters:
linear_layer = nn.Linear(in_features=5, out_features=3)
# Examine the weight matrix
print(f"Weight shape: {linear_layer.weight.shape}")
print(f"Weight values:\n{linear_layer.weight}")
# Examine the bias vector
print(f"Bias shape: {linear_layer.bias.shape}")
print(f"Bias values: {linear_layer.bias}")Output:
Weight shape: torch.Size([3, 5])
Weight values:
Parameter containing:
tensor([[-0.3828, -0.3340, 0.0215, 0.3040, -0.2527],
[-0.1246, -0.2544, -0.0492, -0.3304, 0.2196],
[-0.1534, 0.3974, -0.2299, -0.1213, 0.2030]], requires_grad=True)
Bias shape: torch.Size([3])
Bias values: Parameter containing:
tensor([ 0.2663, 0.1126, -0.0917], requires_grad=True)I executed the above example code and added the screenshot below.

The weight matrix has a shape of [out_features, in_features] and the bias vector has a shape of [out_features]. By default, PyTorch initializes these parameters using the Kaiming uniform initialization.
Create a Simple Neural Network with nn.Linear
Now let’s create a more practical example, a simple neural network for classifying the popular MNIST dataset of handwritten digits:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Define a simple neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(28*28, 128)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.flatten(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
# Initialize the model, loss function, and optimizer
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Train the model
num_epochs = 5
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for images, labels in train_loader:
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Epoch {epoch+1}/{num_epochs}, Loss: {running_loss/len(train_loader):.4f}")Output:
Epoch 1/5, Loss: 0.3398
Epoch 2/5, Loss: 0.1549
Epoch 3/5, Loss: 0.1068
Epoch 4/5, Loss: 0.0800
Epoch 5/5, Loss: 0.0634I executed the above example code and added the screenshot below.

In this example, I’ve created a simple neural network with two linear layers. The first layer transforms the flattened input image (28×28=784 features) to 128 features, and the second layer transforms these 128 features to 10 output classes (the digits 0-9).
Read PyTorch TanH
Advanced Usage of nn.Linear
Let me explain to you the advanced usage of nn.linear.
Custom Weight Initialization
Sometimes, you might want to initialize the weights of your linear layers differently:
def init_weights(m):
if isinstance(m, nn.Linear):
torch.nn.init.xavier_uniform_(m.weight)
m.bias.data.fill_(0.01)
model = SimpleNN()
model.apply(init_weights)In this example, I’m using Xavier (Glorot) initialization for the weights and setting all biases to 0.01.
Linear Layer without Bias
In some cases, you might want to create a linear layer without bias:
linear_no_bias = nn.Linear(in_features=10, out_features=5, bias=False)This is useful in situations where you’ll be applying batch normalization after the linear layer, as the bias becomes redundant.
Use nn.Linear in a Sequential Model
PyTorch’s nn.Sequential allows you to stack layers in sequence:
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 10)
)This creates a neural network with three linear layers and ReLU activations between them.
Build a Real-World Example: Stock Price Prediction
Let’s create a more practical example – a model that predicts stock prices for a major U.S. company:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Create a simple LSTM model with linear layers
class StockPredictor(nn.Module):
def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
super(StockPredictor, self).__init__()
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = self.fc(out[:, -1, :])
return out
# Sample data (in practice, you would load real stock data)
# This simulates Apple stock prices for 500 days
np.random.seed(42)
dates = pd.date_range(start='2022-01-01', periods=500)
close_prices = np.random.randn(500).cumsum() + 150 # Starting around $150
# Create features (for simplicity, we'll use past 30 days to predict next day)
def create_sequences(data, seq_length):
xs, ys = [], []
for i in range(len(data) - seq_length):
x = data[i:i+seq_length]
y = data[i+seq_length]
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
# Normalize the data
scaler = MinMaxScaler()
close_prices_scaled = scaler.fit_transform(close_prices.reshape(-1, 1))
# Create sequences
seq_length = 30
X, y = create_sequences(close_prices_scaled, seq_length)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.FloatTensor(y_train)
y_test = torch.FloatTensor(y_test)
# Initialize model
input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
model = StockPredictor(input_dim, hidden_dim, num_layers, output_dim)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Train the model
num_epochs = 100
for epoch in range(num_epochs):
model.train()
optimizer.zero_grad()
outputs = model(X_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
# Make predictions
model.eval()
with torch.no_grad():
y_pred = model(X_test)
y_pred = scaler.inverse_transform(y_pred.numpy())
y_test_actual = scaler.inverse_transform(y_test.numpy())In this example, I’ve used an LSTM network with a final linear layer to predict stock prices. The linear layer takes the output from the LSTM and transforms it into our predicted price.
Check out PyTorch Softmax
Common Mistakes and How to Avoid Them
Let me show you some common mistakes that occur while working with nn.linear and how we can avoid them.
Mismatch in Input/Output Dimensions
One common mistake is not matching the dimensions correctly:
# Incorrect
x = torch.randn(10, 5)
layer = nn.Linear(3, 2) # This expects 3 input features, but x has 5
output = layer(x) # This will cause an errorAlways ensure that the in_features parameter of your linear layer matches the last dimension of your input tensor.
Forget to Flatten Input
When using linear layers after convolutional layers, you need to flatten the output:
class ConvNet(nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.conv1 = nn.Conv2d(1, 16, 3, 1)
self.conv2 = nn.Conv2d(16, 32, 3, 1)
# Need to calculate the correct input size for the linear layer
self.fc1 = nn.Linear(32 * 5 * 5, 10) # Assuming a 28x28 input image
def forward(self, x):
x = self.conv1(x)
x = nn.functional.relu(x)
x = self.conv2(x)
x = nn.functional.relu(x)
def forward(self, x):
x = self.conv1(x)
x = nn.functional.relu(x)
x = self.conv2(x)
x = nn.functional.relu(x)
# Flatten
x = torch.flatten(x, 1) # Flatten all dimensions except batch
x = self.fc1(x)
return xForgetting to flatten the tensor before passing it to a linear layer is a common mistake that leads to dimension errors. Always make sure your data is properly reshaped.
Not Initializing Weights Properly
Default initialization works well in many cases, but for deep networks, proper initialization can be crucial:
# Better practice for deep networks
def init_weights(m):
if isinstance(m, nn.Linear):
# Use different initialization based on activation function
if getattr(m, 'activation', None) == nn.ReLU:
nn.init.kaiming_normal_(m.weight, nonlinearity='relu')
else:
nn.init.xavier_normal_(m.weight)
if m.bias is not None:
nn.init.zeros_(m.bias)
Performance Optimization with nn.Linear
Now, I will explain to you the performance optimization with nn.linear.
Use Different Data Types
For better performance or to reduce memory usage, you can specify the data type:
# Create a half-precision linear layer for faster computation on compatible GPUs
linear_fp16 = nn.Linear(100, 50).half()
# Or use double precision if needed
linear_fp64 = nn.Linear(100, 50).double()
Leverage GPU Acceleration
PyTorch makes it easy to use GPU acceleration:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleNN().to(device)
inputs = torch.randn(32, 784).to(device)
outputs = model(inputs)
Moving your model and data to the GPU can dramatically speed up training, especially for larger networks.
Check out the Use PyTorch Cat function
nn.Linear vs. Functional API
PyTorch offers two ways to create linear layers: the module-based nn.Linear and the functional F.linear:
import torch.nn.functional as F
# Using nn.Linear (stateful, stores weights and biases)
layer = nn.Linear(10, 5)
output = layer(input_tensor)
# Using F.linear (stateless, weights and biases must be provided)
weight = torch.randn(5, 10)
bias = torch.randn(5)
output = F.linear(input_tensor, weight, bias)
The module-based approach is generally preferred as it manages the parameters for you, but the functional API can be useful in specific scenarios where you need more flexibility.
Implement a Multilayer Perceptron for Customer Churn Prediction
Let’s create a real-world example for a U.S. telecommunications company that wants to predict customer churn:
class ChurnPredictor(nn.Module):
def __init__(self, input_size):
super(ChurnPredictor, self).__init__()
self.fc1 = nn.Linear(input_size, 64)
self.dropout1 = nn.Dropout(0.3)
self.fc2 = nn.Linear(64, 32)
self.dropout2 = nn.Dropout(0.2)
self.fc3 = nn.Linear(32, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout1(x)
x = F.relu(self.fc2(x))
x = self.dropout2(x)
x = self.fc3(x)
x = self.sigmoid(x)
return x
# Feature columns might include:
# - Customer tenure (months)
# - Monthly charges
# - Total charges
# - Services subscribed (internet, phone, streaming, etc.)
# - Demographics (age, gender, etc.)
# - Contract type
# - Payment method
# - etc.
# Assuming we have 20 features after one-hot encoding
input_size = 20
model = ChurnPredictor(input_size)
# Example usage
# X_train, X_test, y_train, y_test would be your actual data
# X_train = ...
# y_train = ...
# criterion = nn.BCELoss()
# optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop would follow here
In this example, I’ve created a neural network with three linear layers to predict customer churn. The first two layers use ReLU activation and dropout for regularization, while the final layer uses sigmoid activation to output a probability.
When to Use nn.Linear vs. Other Layer Types
Understanding when to use linear layers versus other types is important:
- Linear Layers: Basic building blocks for fully connected networks, good for tabular data
- Convolutional Layers: Better for image data where spatial relationships matter
- Recurrent Layers: Better for sequential data like time series or text
- Attention Mechanisms: Effective for capturing long-range dependencies
For many problems, a combination works best. For example, CNNs for feature extraction from images, followed by linear layers for classification.
Working with PyTorch’s nn.Linear layers is fundamental to building effective neural networks. Whether you’re implementing a simple classifier or a complex deep learning model, understanding how to properly use linear layers will help you create more effective architectures.
Remember to match input and output dimensions correctly, properly initialize weights for deeper networks, and consider performance optimizations like GPU acceleration for larger models. With practice, you’ll become comfortable combining linear layers with other PyTorch modules to solve a wide range of machine learning problems.
From classifying images to predicting stock prices and customer behavior, linear layers are versatile building blocks that form the backbone of most neural network architectures. I hope this guide has given you a comprehensive understanding of how to effectively use nn.Linear in your PyTorch projects.
You may like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.