Recently, I was working on a project that required processing sequential data for a natural language processing task. The issue is that traditional neural networks don’t handle sequential data well. So we need a specialized approach.
In this article, I will cover how to implement and use Recurrent Neural Networks (RNNs) in PyTorch with practical examples.
Let’s get started..!
What is an RNN?
Recurrent Neural Networks are specialized neural networks designed to work with sequential data such as text, time series, or speech. Unlike traditional neural networks, RNNs have connections that loop back, allowing information to persist through time steps.
In PyTorch, implementing RNNs is straightforward thanks to built-in modules that handle the complex recurrent calculations for us.
Set Up Your Environment
Before we start coding RNNs, let’s make sure we have everything set up:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")Build a Basic RNN in PyTorch
Let’s create a simple RNN for text classification:
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleRNN, self).__init__()
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
out, _ = self.rnn(x, h0)
out = self.fc(out[:, -1, :])
return out
# Dummy input: batch_size=4, sequence_length=10, input_size=8
x_rnn = torch.randn(4, 10, 8).to(device)
# Instantiate and run model
rnn_model = SimpleRNN(input_size=8, hidden_size=16, output_size=2).to(device)
output_rnn = rnn_model(x_rnn)
print("RNN Output:\n", output_rnn)This model takes an input sequence, processes it through an RNN layer, and then passes the final hidden state through a fully connected layer to get our output.
Implement LSTM Networks
Long Short-Term Memory (LSTM) networks are a special kind of RNN designed to avoid the vanishing gradient problem. They’re great for longer sequences:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(LSTMModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out
# Dummy input
x_lstm = torch.randn(4, 10, 8).to(device)
# Instantiate and run model
lstm_model = LSTMModel(input_size=8, hidden_size=16, output_size=2).to(device)
output_lstm = lstm_model(x_lstm)
print("LSTM Output:\n", output_lstm)Work with GRUs
Gated Recurrent Units (GRUs) are another variant of RNNs that are simpler than LSTMs but still solve the vanishing gradient problem:
class GRUModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(GRUModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
out, _ = self.gru(x, h0)
out = self.fc(out[:, -1, :])
return out
# Dummy input
x_gru = torch.randn(4, 10, 8).to(device)
# Instantiate and run model
gru_model = GRUModel(input_size=8, hidden_size=16, output_size=2).to(device)
output_gru = gru_model(x_gru)
print("GRU Output:\n", output_gru)Output:
RNN Output:
tensor([[ 0.5106, -0.2470],
[ 0.3434, -0.3448],
[ 0.2401, -0.3848],
[ 0.4531, -0.1722]], grad_fn=<AddmmBackward0>)
LSTM Output:
tensor([[-0.2076, 0.3041],
[-0.2976, 0.1294],
[-0.2230, 0.0996],
[-0.3048, 0.1863]], grad_fn=<AddmmBackward0>)
GRU Output:
tensor([[0.1704, 0.1796],
[0.2411, 0.2614],
[0.1312, 0.1723],
[0.1640, 0.1732]], grad_fn=<AddmmBackward0>)I executed the above example code and added the screenshot below.

Read PyTorch Leaky ReLU
Practical Example: Sentiment Analysis
Let’s implement a practical example using RNNs for sentiment analysis on movie reviews:
import torch
from torchtext.datasets import IMDB
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
# Step 1: Tokenizer
tokenizer = get_tokenizer("basic_english")
# Step 2: Helper to yield tokens
def yield_tokens(data_iter):
for label, line in data_iter:
yield tokenizer(line)
# Step 3: Load IMDB (train split)
train_iter = IMDB(split='train')
# Step 4: Build vocabulary
vocab = build_vocab_from_iterator(yield_tokens(train_iter), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])
print(f"Vocab size: {len(vocab)}") Output:
Vocab size: 100683I executed the above example code and added the screenshot below.

Check out Jax Vs PyTorch
Train an RNN Model
Now let’s see how to train our RNN model:
def train(model, iterator, optimizer, criterion):
model.train()
epoch_loss = 0
for batch in iterator:
optimizer.zero_grad()
text, text_lengths = batch.text
predictions = model(text).squeeze(1)
loss = criterion(predictions, batch.label)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
return epoch_loss / len(iterator)
# Optimizer and loss function
optimizer = torch.optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
# Training loop
N_EPOCHS = 5
for epoch in range(N_EPOCHS):
train_loss = train(model, train_iterator, optimizer, criterion)
print(f'Epoch: {epoch+1}, Train Loss: {train_loss:.3f}')Read PyTorch nn Conv2d
Time Series Forecasting with RNNs
RNNs are excellent for time series data. Here’s a simple example for stock price prediction:
# Create a sample time series dataset
def create_sequences(data, seq_length):
xs = []
ys = []
for i in range(len(data) - seq_length):
x = data[i:i+seq_length]
y = data[i+seq_length]
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
# Sample data (e.g., stock prices)
data = np.sin(np.linspace(0, 100, 1000))
seq_length = 50
X, y = create_sequences(data, seq_length)
# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X).unsqueeze(2) # Add feature dimension
y_tensor = torch.FloatTensor(y)
# Define a time series RNN model
class TimeSeriesRNN(nn.Module):
def __init__(self, input_size=1, hidden_size=50, output_size=1):
super(TimeSeriesRNN, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
c0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
out, _ = self.lstm(x, (h0, c0))
out = self.linear(out[:, -1, :])
return out
# Create model, loss function, and optimizer
model = TimeSeriesRNN()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Training loop
n_epochs = 100
batch_size = 32
for epoch in range(n_epochs):
model.train()
for i in range(0, len(X_tensor), batch_size):
batch_X = X_tensor[i:i+batch_size]
batch_y = y_tensor[i:i+batch_size]
optimizer.zero_grad()
outputs = model(batch_X).squeeze()
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f'Epoch {epoch+1}/{n_epochs}, Loss: {loss.item():.4f}')RNN vs LSTM vs GRU: When to Use Each
Each type of recurrent network has its strengths:
- Basic RNNs: Good for short sequences where long-term dependencies aren’t critical.
- LSTMs: Excellent for long sequences where capturing long-term dependencies is important.
- GRUs: A good middle ground – simpler than LSTMs but still handles long-term dependencies well.
In my experience, LSTMs tend to work best for complex sequence tasks like language modeling, while GRUs can be more efficient for simpler tasks.
Tips for Optimizing RNN Performance
- Use Gradient Clipping: This prevents exploding gradients, a common issue in RNNs.
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)- Batch Normalization: Can help stabilize and speed up training.
self.bn = nn.BatchNorm1d(hidden_size)- Dropout: Helps prevent overfitting.
self.dropout = nn.Dropout(0.5)- Bidirectional RNNs: Consider these for tasks where context from both directions matters.
self.lstm = nn.LSTM(input_size, hidden_size, bidirectional=True)I hope you found this article helpful. RNNs are powerful tools for working with sequential data in PyTorch, and with variants like LSTM and GRU, you can tackle complex problems ranging from natural language processing to time series forecasting.
Remember that the best RNN architecture depends on your specific task – experiment with different configurations to find what works best for your works best for your data and problem.
Check out PyTorch Reshape Tensor
Real-World Applications of RNNs
RNNs have numerous applications across various domains. Let me share a few examples I’ve worked with:
Natural Language Processing
One of my recent projects involved building a text generation system for creating product descriptions:
class TextGenerationRNN(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, n_layers=2, dropout=0.5):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers,
dropout=dropout, batch_first=True)
self.fc = nn.Linear(hidden_dim, vocab_size)
self.dropout = nn.Dropout(dropout)
def forward(self, text, hidden=None):
# text shape: [batch size, sequence length]
embedded = self.dropout(self.embedding(text))
# embedded shape: [batch size, sequence length, embedding dim]
if hidden is None:
output, (hidden, cell) = self.lstm(embedded)
else:
output, (hidden, cell) = self.lstm(embedded, hidden)
# output shape: [batch size, sequence length, hidden dim]
output = self.dropout(output)
prediction = self.fc(output)
# prediction shape: [batch size, sequence length, vocab size]
return prediction, (hidden, cell)
Predictive Analytics for Business
For a U.S. retail client, I implemented an RNN to forecast sales based on historical data:
class SalesForecaster(nn.Module):
def __init__(self, input_features, hidden_size=64, num_layers=2):
super().__init__()
self.lstm = nn.LSTM(
input_size=input_features,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
dropout=0.3 if num_layers > 1 else 0
)
self.linear = nn.Linear(hidden_size, 1)
def forward(self, x):
# x shape: [batch_size, seq_len, features]
lstm_out, _ = self.lstm(x)
# We only want the last time step for forecasting
y_pred = self.linear(lstm_out[:, -1, :])
return y_pred
Handle Variable Length Sequences
In real-world applications, sequence lengths often vary. Here’s how to handle that efficiently:
def prepare_sequence_batch(sequences):
"""Pack variable length sequences for efficient processing"""
# Sort sequences by length in descending order
sequences.sort(key=lambda x: len(x), reverse=True)
lengths = [len(seq) for seq in sequences]
# Pad sequences
padded_seqs = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
# Pack padded sequences
packed_seqs = torch.nn.utils.rnn.pack_padded_sequence(
padded_seqs, lengths, batch_first=True
)
return packed_seqs
# In your model:
packed_output, (hidden, cell) = self.lstm(packed_sequences)
# Unpack if needed
output, _ = torch.nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
Transfer Learning with RNNs
Transfer learning is also possible with RNNs. For a recent project analyzing social media sentiment for a U.S. company, I fine-tuned a pre-trained language model:
# Load pre-trained embedding vectors (like GloVe)
pretrained_embeddings = torch.FloatTensor(np.load('glove_embeddings.npy'))
class TransferRNN(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, pretrained_embeddings):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
# Load pretrained embeddings
self.embedding.weight.data.copy_(pretrained_embeddings)
# Freeze embeddings
self.embedding.weight.requires_grad = False
self.rnn = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True, batch_first=True)
self.fc = nn.Linear(hidden_dim * 2, output_dim)
def forward(self, text):
embedded = self.embedding(text)
output, (hidden, cell) = self.rnn(embedded)
# Concatenate the final forward and backward hidden states
hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
return self.fc(hidden)
Read PyTorch Conv1d
Performance Optimization and Scalability
When working with large datasets, optimizing RNN performance becomes crucial:
Vectorized Operations
Whenever possible, use vectorized operations instead of Python loops:
# Instead of:
for i in range(batch_size):
for j in range(seq_length):
# Process each element
# Use vectorized operations:
processed_batch = torch.nn.functional.relu(batch_tensor)
GPU Acceleration
Moving your model to GPU can significantly speed up training:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
inputs = inputs.to(device)
Mixed Precision Training
For very large models, consider using mixed precision training:
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for batch in dataloader:
optimizer.zero_grad()
with autocast():
outputs = model(inputs)
loss = criterion(outputs, targets)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
Check out the PyTorch View Tutorial
Deployment Considerations
When deploying RNN models to production, keep these factors in mind:
- Model Size: RNNs, especially LSTMs with many layers, can be large. Consider quantization or distillation for mobile/edge devices.
- Inference Speed: For real-time applications, GRUs may be preferable to LSTMs due to their simpler architecture and faster inference.
- Stateful Processing: For streaming data, you’ll need to maintain the hidden state between batches:
hidden = None # Initial state for batch in streaming_data: output, hidden = model(batch, hidden)
Recurrent Neural Networks remain a powerful tool in the deep learning toolkit, especially for sequential data. While newer architectures like Transformers have gained popularity for certain tasks, RNNs still offer an excellent balance of performance and efficiency for many real-world applications.
You may like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.