Recently, I was working on a deep learning project where I needed to choose the right activation function for my neural network. The challenge was, there are so many options in TensorFlow, each with different properties and use cases. So, which one should I choose?
In this guide, I’ll share everything I’ve learned about TensorFlow activation functions over my years of experience. I’ll cover when to use each function, their strengths and weaknesses, and practical code examples you can implement right away.
So let’s get in!
Activation Functions
Activation functions are a critical component in neural networks that introduce non-linearity into the model. Without them, your neural network would just be a fancy linear regression, regardless of how many layers you add.
Think of activation functions as the decision-makers in your neural network. They determine whether a neuron should be activated or not based on the weighted sum of inputs.
Common Activation Functions in TensorFlow
Now, I will explain some common activation functions in TensorFlow.
ReLU (Rectified Linear Unit)
ReLU is probably the most commonly used activation function in deep learning today. It’s simple yet effective.
import tensorflow as tf
import numpy as np
# Using ReLU in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
# Compile the model
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Generate dummy data (1000 samples of 784 features and labels from 0 to 9)
X_dummy = np.random.random((1000, 784))
y_dummy = np.random.randint(0, 10, size=(1000,))
# Train the model on dummy data
model.fit(X_dummy, y_dummy, epochs=5, batch_size=32)
# Evaluate the model
loss, accuracy = model.evaluate(X_dummy, y_dummy)
print(f"Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")You can refer to the screenshot below to see the output.

ReLU simply outputs the input if it’s positive, otherwise, it outputs zero. This simplicity makes it computationally efficient while still introducing the necessary non-linearity.
However, ReLU can suffer from the “dying ReLU” problem, where neurons can become permanently inactive if they receive large negative inputs.
Sigmoid
Sigmoid squashes values between 0 and 1, making it useful for outputs that represent probabilities.
import tensorflow as tf
import numpy as np
# Using Sigmoid in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(1, activation='sigmoid') # Binary classification
])
# Generate dummy input (e.g., flattened 28x28 image)
dummy_input = np.random.rand(1, 784).astype(np.float32)
# Run the model to get a prediction
output = model(dummy_input)
# Print the output
print("Sigmoid output (probability):", output.numpy())You can refer to the screenshot below to see the output.

I’ve found sigmoid particularly useful for binary classification problems where you need to predict one of two classes. However, it’s prone to vanishing gradient problems in deep networks.
Check out Iterate Over Tensor In TensorFlow
Tanh (Hyperbolic Tangent)
Tanh is similar to sigmoid but outputs values between -1 and 1, which can help with convergence in some networks.
import tensorflow as tf
import numpy as np
# Using Tanh in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='tanh', input_shape=(784,)),
tf.keras.layers.Dense(10, activation='softmax')
])
# Generate dummy input (e.g., flattened 28x28 image)
dummy_input = np.random.rand(1, 784).astype(np.float32)
# Run the model to get a prediction
output = model(dummy_input)
# Print the output probabilities (softmax output)
print("Model output probabilities:", output.numpy())You can refer to the screenshot below to see the output.

I often use tanh for hidden layers when I need values centered around zero, which can help with the learning process.
Read Convert Tensor to Numpy in TensorFlow
Softmax
Softmax is perfect for multi-class classification problems, as it converts a vector of values into a probability distribution.
# Using Softmax in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(10, activation='softmax') # 10 classes (e.g., MNIST digits)
])When working with datasets like MNIST or CIFAR-10, softmax is typically the go-to activation function for the output layer.
Advanced Activation Functions
Let me show you some advanced activation functions.
Leaky ReLU
Leaky ReLU addresses the dying ReLU problem by allowing a small negative slope instead of zero.
# Using Leaky ReLU in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, input_shape=(784,)),
tf.keras.layers.LeakyReLU(alpha=0.01), # Small slope for negative inputs
tf.keras.layers.Dense(10, activation='softmax')
])I’ve found this particularly helpful when training very deep networks where ReLU neurons might die out during training.
PReLU (Parametric ReLU)
PReLU takes Leaky ReLU one step further by making the negative slope a learnable parameter.
# Using PReLU in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, input_shape=(784,)),
tf.keras.layers.PReLU(),
tf.keras.layers.Dense(10, activation='softmax')
])This adds flexibility to your model, allowing it to learn the optimal negative slope during training.
ELU (Exponential Linear Unit)
ELU uses an exponential function to provide smooth negative values, which can lead to faster convergence.
# Using ELU in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, input_shape=(784,)),
tf.keras.layers.ELU(alpha=1.0),
tf.keras.layers.Dense(10, activation='softmax')
])I’ve had good results with ELU in networks where ReLU wasn’t performing well, particularly with noisy data.
SELU (Scaled Exponential Linear Unit)
SELU is designed to self-normalize neural networks, which can help with training stability.
# Using SELU in a TensorFlow model
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='selu', input_shape=(784,)),
tf.keras.layers.Dense(64, activation='selu'),
tf.keras.layers.Dense(10, activation='softmax')
])For best results with SELU, use the ‘lecun_normal’ kernel initializer and consider adding AlphaDropout layers instead of regular Dropout.
Check out TensorFlow One_Hot Encoding
Implement Custom Activation Functions
Sometimes you might need a custom activation function that isn’t available in TensorFlow. Here’s how to create one:
def custom_activation(x):
# Example: A modified ReLU that caps values at 5
return tf.minimum(tf.maximum(0.0, x), 5.0)
# Using the custom activation function
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, input_shape=(784,)),
tf.keras.layers.Lambda(custom_activation),
tf.keras.layers.Dense(10, activation='softmax')
])You can also use the tf.keras.layers.Activation layer with a custom function:
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, input_shape=(784,)),
tf.keras.layers.Activation(custom_activation),
tf.keras.layers.Dense(10, activation='softmax')
])Read Tensor in TensorFlow
Choose the Right Activation Function
Selecting the right activation function can significantly impact your model’s performance. Here are some guidelines I follow:
- For hidden layers: Start with ReLU as your default choice. If you encounter issues, try Leaky ReLU or ELU.
- For output layers:
- Binary classification: Sigmoid
- Multi-class classification: Softmax
- Regression problems: Linear (no activation)
- For very deep networks: Consider SELU with proper initialization.
- When vanishing gradients are an issue: Try ReLU variants like Leaky ReLU or ELU.
Real-World Example: Image Classification
Let’s put this knowledge into practice with a real-world example of image classification using the Fashion MNIST dataset:
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
import numpy as np
# Load and preprocess the Fashion MNIST dataset
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28*28)
x_test = x_test.reshape(-1, 28*28)
# Create a model with different activation functions
model = tf.keras.Sequential([
tf.keras.layers.Dense(256, input_shape=(784,)),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dense(128),
tf.keras.layers.ELU(),
tf.keras.layers.Dense(64),
tf.keras.layers.ReLU(),
tf.keras.layers.Dense(10, activation='softmax')
])
# Compile and train the model
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.1)
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")In this example, I’ve combined different activation functions in the same model to showcase how you might mix and match them in a real application.
Activation functions are a key ingredient in creating effective neural networks. While ReLU is a great starting point for most applications, understanding the full range of options gives you the flexibility to optimize your models for specific problems.
The best way to determine which activation function works best for your particular case is often through experimentation. I recommend trying different functions and comparing their performance on your validation data.
I hope you found this guide helpful.
Other TensorFlow articles you may also like:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.