PyTorch Fully Connected Layer

Recently, I was working on a deep learning project where I needed to understand and implement fully connected layers in PyTorch. The fully connected layer, also known as a linear layer, is a fundamental building block in neural networks.

In this article, I will share my experiences with PyTorch’s fully connected layers and demonstrate how to effectively implement and use them in your neural network models.

Let’s get started..!

This Tutorial Covers:

Fully Connected Layer in PyTorch

A fully connected layer, or linear layer in PyTorch terminology, is a neural network layer where each neuron is connected to every neuron in the previous layer. This creates a dense network structure, which is why these layers are sometimes called “dense” layers.

In PyTorch, we implement fully connected layers using the nn.Linear class. The basic syntax is simple:

import torch.nn as nn

# Create a fully connected layer with 5 inputs and 3 outputs
fc_layer = nn.Linear(in_features=5, out_features=3)

The in_features parameter represents the number of input features, while out_features represents the number of output features (or neurons) in the layer.

Create a Basic Fully Connected Layer

Let’s start with a simple example. Imagine we’re creating a model to predict house prices based on 4 features (size, number of bedrooms, location score, and age of the house).

import torch
import torch.nn as nn

# Create a fully connected layer
fc = nn.Linear(in_features=4, out_features=1)

# Create some sample input data (batch of 3 houses with 4 features each)
sample_input = torch.tensor([[2000, 3, 8.5, 15],
                             [1500, 2, 7.2, 10],
                             [3000, 4, 9.0, 5]], dtype=torch.float32)

# Forward pass through the fully connected layer
output = fc(sample_input)
print(output)

Output:

tensor([[ 846.2268],
        [ 634.9600],
        [1269.5112]], grad_fn=<AddmmBackward0>)

You can see the output in the screenshot below.

In this example, we’ve created a fully connected layer that takes 4 inputs (our house features) and produces 1 output (the predicted price). The forward pass computes the linear transformation of our input data.

When running this code, you’ll get output predictions for each of the three houses in our batch.

Read Cross-Entropy Loss PyTorch

Use Multiple Fully Connected Layers

In practice, we often stack multiple fully connected layers to create a deep neural network. Here’s how to create a simple network with multiple linear layers:

import torch
import torch.nn as nn

class HousePriceModel(nn.Module):
    def __init__(self):
        super(HousePriceModel, self).__init__()
        self.fc1 = nn.Linear(4, 8)    # Input layer: 4 features to 8 neurons
        self.relu = nn.ReLU()         # Activation function
        self.fc2 = nn.Linear(8, 4)    # Hidden layer: 8 to 4 neurons
        self.fc3 = nn.Linear(4, 1)    # Output layer: 4 neurons to 1 output

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Create the model
model = HousePriceModel()

# Create sample input
sample_input = torch.tensor([[2000, 3, 8.5, 15]], dtype=torch.float32)

# Forward pass
output = model(sample_input)
print(f"Predicted house price: {output.item()}")

Output:

Predicted house price: 105.91314697265625

You can see the output in the screenshot below.

In this example, I’ve created a simple neural network with three fully connected layers. The first layer maps our 4 input features to 8 neurons, the second maps those 8 neurons to 4, and the final layer produces our single output value (the predicted house price).

Check out Adam Optimizer PyTorch

Activation Functions with Fully Connected Layers

Fully connected layers typically include non-linear activation functions to allow the network to learn complex patterns. Without these activations, multiple linear layers would simply collapse into a single linear transformation.

Here’s how to use different activation functions with fully connected layers:

import torch
import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(10, 20)

        # Different activation functions
        self.relu = nn.ReLU()
        self.leaky_relu = nn.LeakyReLU(0.1)
        self.sigmoid = nn.Sigmoid()
        self.tanh = nn.Tanh()

        self.fc2 = nn.Linear(20, 1)

    def forward(self, x):
        # Using ReLU (most common choice)
        x = self.relu(self.fc1(x))

        # Alternative activations:
        # x = self.leaky_relu(self.fc1(x))
        # x = self.sigmoid(self.fc1(x))
        # x = self.tanh(self.fc1(x))

        x = self.fc2(x)
        return x

# Create the model
model = NeuralNetwork()

# Generate a sample input: batch of 5 samples, each with 10 features
input_data = torch.randn(5, 10)

# Pass input through the model
output = model(input_data)

# Print the output
print("Model Output:\n", output)

Output:

Model Output:
 tensor([[0.4284],
        [0.4714],
        [0.3244],
        [0.0082],
        [0.0338]], grad_fn=<AddmmBackward0>

You can see the output in the screenshot below.

The ReLU activation is most commonly used with fully connected layers due to its simplicity and effectiveness. However, depending on your specific task, other activations like LeakyReLU, Sigmoid, or Tanh might work better.

Read PyTorch nn Linear

Practical Example: MNIST Classification

Let’s implement a practical example: a fully connected neural network for classifying handwritten digits using the MNIST dataset.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define the neural network
class MNISTClassifier(nn.Module):
    def __init__(self):
        super(MNISTClassifier, self).__init__()
        # Input images are 28x28 = 784 pixels
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)  # 10 output classes (digits 0-9)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        # Flatten the input image
        x = x.view(-1, 784)

        # First fully connected layer with ReLU and dropout
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)

        # Second fully connected layer
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)

        # Output layer
        x = self.fc3(x)
        return x

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), 
                               transforms.Normalize((0.1307,), (0.3081,))])

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Create the model, loss function, and optimizer
model = MNISTClassifier()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
def train(epochs):
    for epoch in range(epochs):
        model.train()
        running_loss = 0.0

        for batch_idx, (data, target) in enumerate(train_loader):
            # Zero the gradients
            optimizer.zero_grad()

            # Forward pass
            output = model(data)

            # Calculate loss
            loss = criterion(output, target)

            # Backward pass
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()

            if batch_idx % 100 == 0:
                print(f'Epoch: {epoch+1}/{epochs}, Batch: {batch_idx}/{len(train_loader)}, Loss: {loss.item():.4f}')

        print(f'Epoch {epoch+1} complete, Loss: {running_loss/len(train_loader):.4f}')

# Train the model for 5 epochs
train(5)

In this example, I’ve created a neural network with three fully connected layers to classify MNIST digits. The input layer takes the flattened image (784 pixels), followed by two hidden layers with 128 and 64 neurons, respectively and an output layer with 10 neurons (one for each digit).

I’ve also included dropout layers to prevent overfitting, which is a common practice when using fully connected layers.

Check out PyTorch Batch Normalization

Best Practices for Fully Connected Layers

Based on my experience working with PyTorch’s fully connected layers, here are some best practices:

Layer Sizing: Start with larger layers and narrow them down as you approach the output. This allows the network to learn a broad representation before refining it.
Batch Normalization: Consider adding batch normalization between fully connected layers to improve training stability:

self.fc1 = nn.Linear(784, 128)
self.bn1 = nn.BatchNorm1d(128)

# In the forward method:
x = self.fc1(x)
x = self.bn1(x)
x = self.relu(x)

Weight Initialization: PyTorch initializes weights automatically, but you can customize initialization for better performance:

# Kaiming initialization (good for ReLU activations)
nn.init.kaiming_normal_(self.fc1.weight)
nn.init.zeros_(self.fc1.bias)

Regularization: Use dropout or weight decay to prevent overfitting, especially with large fully connected layers:

optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)  # L2 regularization

Layer Output Inspection: During development, inspect the output distribution of your fully connected layers to ensure they’re learning properly:

x = self.fc1(input_data)
print(f"Layer 1 output stats: mean={x.mean().item()}, std={x.std().item()}")

Fully connected layers are versatile and useful components in neural networks. While they’ve been partially replaced by convolutional and transformer architectures for specific tasks, they remain essential for many applications. By understanding how to use them effectively in PyTorch, you’ll be able to build better neural networks for your specific needs.

You may read:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

PyTorch Fully Connected Layer

Fully Connected Layer in PyTorch

Create a Basic Fully Connected Layer

Use Multiple Fully Connected Layers

Activation Functions with Fully Connected Layers

Practical Example: MNIST Classification

Best Practices for Fully Connected Layers

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends