In my decade-plus journey as a Python developer, I’ve witnessed the evolution of deep learning frameworks, with PyTorch emerging as one of the most useful tools in the field. When building neural networks, activation functions play a crucial role in introducing non-linearity to our models.
The hyperbolic tangent function, commonly known as TanH, is one of those essential activation functions I find myself using regularly. It squeezes input values between -1 and 1, making it particularly useful for certain types of neural networks.
In this guide, I’ll walk you through everything you need to know about implementing and optimizing the TanH activation function in PyTorch.
TanH Function
The TanH (hyperbolic tangent) function is mathematically defined as:
tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))Unlike the sigmoid function that maps inputs to the range (0,1), TanH maps inputs to the range (-1,1). This symmetry around zero makes it particularly useful in many applications.
Here’s how the TanH function looks graphically:
- When x approaches negative infinity, TanH approaches -1
- When x approaches positive infinity, TanH approaches 1
- When x = 0, TanH = 0
This function is differentiable, which makes it suitable for backpropagation during neural network training.
Read PyTorch nn Linear
Implement TanH in PyTorch
PyTorch provides multiple ways to implement the TanH activation function. Let’s explore each method with practical examples.
Method 1: Use torch.nn.Tanh()
The simple way to implement TanH in your neural network is by using the built-in torch.nn.Tanh() module in Python.
import torch
import torch.nn as nn
# Creating a simple neural network with TanH activation
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(10, 20)
self.tanh = nn.Tanh()
self.fc2 = nn.Linear(20, 1)
def forward(self, x):
x = self.fc1(x)
x = self.tanh(x)
x = self.fc2(x)
return x
# Create a model instance
model = SimpleNN()
# Create a random input tensor
input_tensor = torch.randn(5, 10) # Batch of 5 samples, 10 features each
# Forward pass
output = model(input_tensor)
print(output.shape)Output
torch.Size([5, 1])I executed the above example code and added the screenshot below.

This method is ideal when building models using PyTorch’s sequential or modular approach. I often use this in production-level code for its clarity and maintainability.
Check out PyTorch Batch Normalization
Method 2: Use torch.tanh() Function
If you prefer a more functional approach, PyTorch provides the Python torch.tanh() function:
import torch
# Create a random tensor
x = torch.randn(3, 4)
print("Original tensor:")
print(x)
# Apply TanH activation
y = torch.tanh(x)
print("\nAfter TanH activation:")
print(y)
# Verify that all values are in the range [-1, 1]
print("\nMinimum value:", y.min().item())
print("Maximum value:", y.max().item())I executed the above example code and added the screenshot below.

I find this method particularly useful when I need to apply TanH to specific tensors during custom operations or when writing research code that requires more flexibility.
Read PyTorch Load Model
Method 3: Use Functional API
PyTorch’s functional API offers another elegant way to implement TanH:
import torch
import torch.nn.functional as F
# Creating a neural network with functional TanH
class FunctionalNN(torch.nn.Module):
def __init__(self):
super(FunctionalNN, self).__init__()
self.fc1 = torch.nn.Linear(10, 20)
self.fc2 = torch.nn.Linear(20, 1)
def forward(self, x):
x = self.fc1(x)
x = F.tanh(x) # Using functional TanH
x = self.fc2(x)
return xThis approach is excellent for models where you want to keep the module definitions clean while specifying activations in the forward pass.
Check out PyTorch Tensor to Numpy
Practical Example: Sentiment Analysis of US Customer Reviews
Let’s implement a practical example where TanH activation can be beneficial, a sentiment analysis model for American customer reviews:
import torch
import torch.nn as nn
import torch.optim as optim
# Sample data (in practice, you would load real US customer reviews)
# 1 for positive, 0 for negative sentiment
reviews = [
"This product exceeded my expectations! Shipping was fast too.",
"Not worth the money. Poor quality and arrived damaged.",
"Great customer service when I had questions about my order.",
"The size runs small. Had to return it which was a hassle."
]
labels = torch.tensor([1, 0, 1, 0], dtype=torch.float32).reshape(-1, 1)
# For illustration (in a real scenario, you'd use proper NLP techniques)
# Simple embedding: converting text length and positive word count to features
def simple_features(text):
positive_words = ['great', 'good', 'exceeded', 'fast', 'worth']
count = sum(1 for word in text.lower().split() if word in positive_words)
length = len(text)
return [count, length]
features = torch.tensor([simple_features(review) for review in reviews], dtype=torch.float32)
# Simple model with TanH activation
class SentimentModel(nn.Module):
def __init__(self):
super(SentimentModel, self).__init__()
self.fc1 = nn.Linear(2, 5)
self.tanh = nn.Tanh()
self.fc2 = nn.Linear(5, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.fc1(x)
x = self.tanh(x)
x = self.fc2(x)
x = self.sigmoid(x)
return x
# Create and train model
model = SentimentModel()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
# Training loop
for epoch in range(1000):
# Forward pass
outputs = model(features)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 100 == 0:
print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
# Test the model
with torch.no_grad():
predicted = model(features)
print("\nPredictions:")
for i, review in enumerate(reviews):
sentiment = "Positive" if predicted[i].item() > 0.5 else "Negative"
confidence = max(predicted[i].item(), 1-predicted[i].item())
print(f"Review: {review}")
print(f"Predicted sentiment: {sentiment} (confidence: {confidence:.2f})\n")I executed the above example code and added the screenshot below.

In this example, TanH helps normalize the intermediate activations, which can lead to more stable training for this sentiment analysis task.
Read PyTorch MSELoss
TanH vs. Other Activation Functions
When should you choose TanH over other activation functions? Here’s my experience:
- TanH vs. ReLU: TanH works better when you need outputs centered around zero. ReLU is often faster and helps mitigate the vanishing gradient problem, but only outputs positive values.
- TanH vs. Sigmoid: Both map to bounded ranges, but TanH’s range of [-1,1] is zero-centered, which often leads to faster convergence than sigmoid’s range of [0,1].
- TanH vs. Leaky ReLU: Leaky ReLU typically performs better for very deep networks, while TanH can be preferable for recurrent neural networks.
In my US-based projects, I’ve found TanH particularly useful for:
- Recurrent Neural Networks (RNNs and LSTMs)
- Models with normalized inputs
- Financial time series prediction (especially for US stock market data)
- Control systems with bipolar outputs
Optimize TanH Usage
After years of working with PyTorch, I’ve discovered several optimization tips for TanH:
- Input normalization: TanH works best when inputs are normalized around zero.
- Batch normalization: Adding batch normalization before TanH can improve training stability.
class ImprovedNN(nn.Module):
def __init__(self):
super(ImprovedNN, self).__init__()
self.fc1 = nn.Linear(10, 20)
self.bn = nn.BatchNorm1d(20)
self.tanh = nn.Tanh()
self.fc2 = nn.Linear(20, 1)
def forward(self, x):
x = self.fc1(x)
x = self.bn(x)
x = self.tanh(x)
x = self.fc2(x)
return x- Xavier/Glorot initialization: This initialization works particularly well with TanH:
def init_weights(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
m.bias.data.fill_(0.01)
model = SimpleNN()
model.apply(init_weights)- Learning rate adjustment: TanH may require lower learning rates compared to ReLU-based networks.
Common Issues and Solutions
In my years of using PyTorch’s TanH, I’ve encountered these common issues:
- Vanishing gradients: TanH can suffer from vanishing gradients with very deep networks. Solution: Use residual connections or consider alternatives for very deep networks.
- Slower convergence: TanH networks sometimes train slower than ReLU networks. Solution: Use adaptive optimizers like Adam instead of SGD.
- Output range limitation: The [-1,1] range may require adjustment for certain tasks. Solution: Add a scaling layer after TanH if needed.
Working with TanH in PyTorch has been a valuable part of my deep learning toolkit over the years. While newer activation functions get more attention, TanH remains extremely useful for specific applications, especially in the financial technology sector, which is prominent in the US market.
By understanding when and how to use TanH effectively, you can build more powerful and stable neural networks for a wide range of applications. Whether you’re analyzing American consumer sentiment or predicting US economic trends, the TanH activation function deserves a place in your PyTorch arsenal.
Remember that the best way to determine if TanH is right for your specific use case is through experimentation. PyTorch makes it easy to swap activation functions, so don’t hesitate to try different options and compare results.
You may read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.