PyTorch RNN: Implement Recurrent Neural Networks

Recently, I was working on a project that required processing sequential data for a natural language processing task. The issue is that traditional neural networks don’t handle sequential data well. So we need a specialized approach.

In this article, I will cover how to implement and use Recurrent Neural Networks (RNNs) in PyTorch with practical examples.

Let’s get started..!

What is an RNN?

Recurrent Neural Networks are specialized neural networks designed to work with sequential data such as text, time series, or speech. Unlike traditional neural networks, RNNs have connections that loop back, allowing information to persist through time steps.

In PyTorch, implementing RNNs is straightforward thanks to built-in modules that handle the complex recurrent calculations for us.

Set Up Your Environment

Before we start coding RNNs, let’s make sure we have everything set up:

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Build a Basic RNN in PyTorch

Let’s create a simple RNN for text classification:

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])
        return out

# Dummy input: batch_size=4, sequence_length=10, input_size=8
x_rnn = torch.randn(4, 10, 8).to(device)

# Instantiate and run model
rnn_model = SimpleRNN(input_size=8, hidden_size=16, output_size=2).to(device)
output_rnn = rnn_model(x_rnn)
print("RNN Output:\n", output_rnn)

This model takes an input sequence, processes it through an RNN layer, and then passes the final hidden state through a fully connected layer to get our output.

Implement LSTM Networks

Long Short-Term Memory (LSTM) networks are a special kind of RNN designed to avoid the vanishing gradient problem. They’re great for longer sequences:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(LSTMModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

# Dummy input
x_lstm = torch.randn(4, 10, 8).to(device)

# Instantiate and run model
lstm_model = LSTMModel(input_size=8, hidden_size=16, output_size=2).to(device)
output_lstm = lstm_model(x_lstm)
print("LSTM Output:\n", output_lstm)

Work with GRUs

Gated Recurrent Units (GRUs) are another variant of RNNs that are simpler than LSTMs but still solve the vanishing gradient problem:

class GRUModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(GRUModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.gru(x, h0)
        out = self.fc(out[:, -1, :])
        return out

# Dummy input
x_gru = torch.randn(4, 10, 8).to(device)

# Instantiate and run model
gru_model = GRUModel(input_size=8, hidden_size=16, output_size=2).to(device)
output_gru = gru_model(x_gru)
print("GRU Output:\n", output_gru)

Output:

RNN Output:
 tensor([[ 0.5106, -0.2470],
        [ 0.3434, -0.3448],
        [ 0.2401, -0.3848],
        [ 0.4531, -0.1722]], grad_fn=<AddmmBackward0>)
LSTM Output:
 tensor([[-0.2076,  0.3041],
        [-0.2976,  0.1294],
        [-0.2230,  0.0996],
        [-0.3048,  0.1863]], grad_fn=<AddmmBackward0>)
GRU Output:
 tensor([[0.1704, 0.1796],
        [0.2411, 0.2614],
        [0.1312, 0.1723],
        [0.1640, 0.1732]], grad_fn=<AddmmBackward0>)

I executed the above example code and added the screenshot below.

Read PyTorch Leaky ReLU

Practical Example: Sentiment Analysis

Let’s implement a practical example using RNNs for sentiment analysis on movie reviews:

import torch
from torchtext.datasets import IMDB
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

# Step 1: Tokenizer
tokenizer = get_tokenizer("basic_english")

# Step 2: Helper to yield tokens
def yield_tokens(data_iter):
    for label, line in data_iter:
        yield tokenizer(line)

# Step 3: Load IMDB (train split)
train_iter = IMDB(split='train')

# Step 4: Build vocabulary
vocab = build_vocab_from_iterator(yield_tokens(train_iter), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])

print(f"Vocab size: {len(vocab)}")

Output:

Vocab size: 100683

I executed the above example code and added the screenshot below.

Check out Jax Vs PyTorch

Train an RNN Model

Now let’s see how to train our RNN model:

def train(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0

    for batch in iterator:
        optimizer.zero_grad()

        text, text_lengths = batch.text
        predictions = model(text).squeeze(1)

        loss = criterion(predictions, batch.label)
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()

    return epoch_loss / len(iterator)

# Optimizer and loss function
optimizer = torch.optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()

# Training loop
N_EPOCHS = 5
for epoch in range(N_EPOCHS):
    train_loss = train(model, train_iterator, optimizer, criterion)
    print(f'Epoch: {epoch+1}, Train Loss: {train_loss:.3f}')

Read PyTorch nn Conv2d

Time Series Forecasting with RNNs

RNNs are excellent for time series data. Here’s a simple example for stock price prediction:

# Create a sample time series dataset
def create_sequences(data, seq_length):
    xs = []
    ys = []
    for i in range(len(data) - seq_length):
        x = data[i:i+seq_length]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

# Sample data (e.g., stock prices)
data = np.sin(np.linspace(0, 100, 1000))
seq_length = 50
X, y = create_sequences(data, seq_length)

# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X).unsqueeze(2)  # Add feature dimension
y_tensor = torch.FloatTensor(y)

# Define a time series RNN model
class TimeSeriesRNN(nn.Module):
    def __init__(self, input_size=1, hidden_size=50, output_size=1):
        super(TimeSeriesRNN, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.linear(out[:, -1, :])
        return out

# Create model, loss function, and optimizer
model = TimeSeriesRNN()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
n_epochs = 100
batch_size = 32
for epoch in range(n_epochs):
    model.train()
    for i in range(0, len(X_tensor), batch_size):
        batch_X = X_tensor[i:i+batch_size]
        batch_y = y_tensor[i:i+batch_size]

        optimizer.zero_grad()
        outputs = model(batch_X).squeeze()
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}/{n_epochs}, Loss: {loss.item():.4f}')

RNN vs LSTM vs GRU: When to Use Each

Each type of recurrent network has its strengths:

Basic RNNs: Good for short sequences where long-term dependencies aren’t critical.
LSTMs: Excellent for long sequences where capturing long-term dependencies is important.
GRUs: A good middle ground – simpler than LSTMs but still handles long-term dependencies well.

In my experience, LSTMs tend to work best for complex sequence tasks like language modeling, while GRUs can be more efficient for simpler tasks.

Tips for Optimizing RNN Performance

Use Gradient Clipping: This prevents exploding gradients, a common issue in RNNs.

   torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Batch Normalization: Can help stabilize and speed up training.

   self.bn = nn.BatchNorm1d(hidden_size)

Dropout: Helps prevent overfitting.

   self.dropout = nn.Dropout(0.5)

Bidirectional RNNs: Consider these for tasks where context from both directions matters.

   self.lstm = nn.LSTM(input_size, hidden_size, bidirectional=True)

I hope you found this article helpful. RNNs are powerful tools for working with sequential data in PyTorch, and with variants like LSTM and GRU, you can tackle complex problems ranging from natural language processing to time series forecasting.

Remember that the best RNN architecture depends on your specific task – experiment with different configurations to find what works best for your works best for your data and problem.

Check out PyTorch Reshape Tensor

Real-World Applications of RNNs

RNNs have numerous applications across various domains. Let me share a few examples I’ve worked with:

Natural Language Processing

One of my recent projects involved building a text generation system for creating product descriptions:

class TextGenerationRNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, n_layers=2, dropout=0.5):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, 
                            dropout=dropout, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text, hidden=None):
        # text shape: [batch size, sequence length]
        embedded = self.dropout(self.embedding(text))
        # embedded shape: [batch size, sequence length, embedding dim]
        
        if hidden is None:
            output, (hidden, cell) = self.lstm(embedded)
        else:
            output, (hidden, cell) = self.lstm(embedded, hidden)
            
        # output shape: [batch size, sequence length, hidden dim]
        output = self.dropout(output)
        prediction = self.fc(output)
        # prediction shape: [batch size, sequence length, vocab size]
        return prediction, (hidden, cell)

Read PyTorch Add Dimension

Predictive Analytics for Business

For a U.S. retail client, I implemented an RNN to forecast sales based on historical data:

class SalesForecaster(nn.Module):
    def __init__(self, input_features, hidden_size=64, num_layers=2):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=input_features,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=0.3 if num_layers > 1 else 0
        )
        self.linear = nn.Linear(hidden_size, 1)
        
    def forward(self, x):
        # x shape: [batch_size, seq_len, features]
        lstm_out, _ = self.lstm(x)
        # We only want the last time step for forecasting
        y_pred = self.linear(lstm_out[:, -1, :])
        return y_pred

Handle Variable Length Sequences

In real-world applications, sequence lengths often vary. Here’s how to handle that efficiently:

def prepare_sequence_batch(sequences):
    """Pack variable length sequences for efficient processing"""
    # Sort sequences by length in descending order
    sequences.sort(key=lambda x: len(x), reverse=True)
    lengths = [len(seq) for seq in sequences]
    
    # Pad sequences
    padded_seqs = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
    
    # Pack padded sequences
    packed_seqs = torch.nn.utils.rnn.pack_padded_sequence(
        padded_seqs, lengths, batch_first=True
    )
    
    return packed_seqs

# In your model:
packed_output, (hidden, cell) = self.lstm(packed_sequences)
# Unpack if needed
output, _ = torch.nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)

Transfer Learning with RNNs

Transfer learning is also possible with RNNs. For a recent project analyzing social media sentiment for a U.S. company, I fine-tuned a pre-trained language model:

# Load pre-trained embedding vectors (like GloVe)
pretrained_embeddings = torch.FloatTensor(np.load('glove_embeddings.npy'))

class TransferRNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, pretrained_embeddings):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        # Load pretrained embeddings
        self.embedding.weight.data.copy_(pretrained_embeddings)
        # Freeze embeddings
        self.embedding.weight.requires_grad = False
        
        self.rnn = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True, batch_first=True)
        self.fc = nn.Linear(hidden_dim * 2, output_dim)
        
    def forward(self, text):
        embedded = self.embedding(text)
        output, (hidden, cell) = self.rnn(embedded)
        
        # Concatenate the final forward and backward hidden states
        hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        return self.fc(hidden)

Read PyTorch Conv1d

Performance Optimization and Scalability

When working with large datasets, optimizing RNN performance becomes crucial:

Vectorized Operations

Whenever possible, use vectorized operations instead of Python loops:

# Instead of:
for i in range(batch_size):
    for j in range(seq_length):
        # Process each element

# Use vectorized operations:
processed_batch = torch.nn.functional.relu(batch_tensor)

GPU Acceleration

Moving your model to GPU can significantly speed up training:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
inputs = inputs.to(device)

Mixed Precision Training

For very large models, consider using mixed precision training:

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    optimizer.zero_grad()
    
    with autocast():
        outputs = model(inputs)
        loss = criterion(outputs, targets)
    
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Check out the PyTorch View Tutorial

Deployment Considerations

When deploying RNN models to production, keep these factors in mind:

Model Size: RNNs, especially LSTMs with many layers, can be large. Consider quantization or distillation for mobile/edge devices.
Inference Speed: For real-time applications, GRUs may be preferable to LSTMs due to their simpler architecture and faster inference.
Stateful Processing: For streaming data, you’ll need to maintain the hidden state between batches:hidden = None # Initial state for batch in streaming_data: output, hidden = model(batch, hidden)

Recurrent Neural Networks remain a powerful tool in the deep learning toolkit, especially for sequential data. While newer architectures like Transformers have gained popularity for certain tasks, RNNs still offer an excellent balance of performance and efficiency for many real-world applications.