While configuring the artificial neural network or machine learning model, **you usually specify the tensorflow activation function for the layers. If you are a beginner and want to know, at a deep level, what activation functions are?.**

In this TensorFlow tutorial, I will explain everything about the TensorFlow activation function, like their use, type, and subtypes, and I will provide a mathematical explanation of how the activation function processes the input value.

Additionally, you will learn how to build a model with different activation functions.

I will explain some commonly used activation functions in the deep and machine learning model.

Overall, I will be covering the following topics:

- What is the activation function?
- What does Linearity and Non-linearity mean in Activation Functions?
- Types of Activation Function
- Where to use Activation Functions in Neural Network
- Building Model with Tensorflow Activation Functions

Let’s begin,

## What is the activation function?

**The activation function is a way to turn input into meaningful output; it is like a gate between the input, fed to the perceptron (neuron), and its output, which goes to the next layer.**

If you have any doubt about neuron or activation function, visit the tutorial How to Build Perceptron in Python and Build Artificial Neural Network in Tensorflow.

If you know about an artificial neural network consisting of three layers: **input**, **hidden**, and **output**, each layer consists of one or multiple perceptrons (neurons). This perceptron takes the input, processes the input, and the activation function is applied to this input to see whether to pass this input value to the next perceptron.

Here, I will explain why the activation function is used and how it contributes to the neural network model in learning.

So, firstly, **the activation function is used in neural networks to introduce non-linearity to the output of the perceptron, which makes neural networks learn more complex patterns.**

Generally, when the inputs are fed to the perceptron (neuron), it applies some weight to the input and computes the summation of the product of input and weight. Then, here, the activation function determines the weight of the input and how important it is for prediction.

Here, the activation function introduces linearity and non-linearity to the perceptron’s output. Let’s understand what linearity and non-linearity mean in the context of the perceptron.

### What does Linearity mean in Activation Functions?

Linearity means a linear relationship where the change in the output is directly proportional to the change in the input. More simply, if you double the input value, the output will double.

In mathematics, this linear relationship can be represented by a straight line in a two-dimensional space, which means using the linear equations of the form **y=mx+c** where **m **is the slope of the line, and **c** is the **y-intercept**.

In the same way, machine learning or neural network models predict the outcome based on the sum of the input variable; a coefficient weights each input variable, and there’s usually a constant bias term added.

The key characteristics of a linear mode are its simplicity and interoperability, but it’s limited to handling problems where the relationship between the input and the target variable is linear.

This kind of linear relationship model is limited; it can’t learn complex patterns from the dataset.

### Non_Linear Activation Function

This kind of relationship is where the change in output is not directly proportional to the change in input. This means that doubling the input won’t necessarily double the output.

Non-linear relationships can represent more complex patterns, such as curves in a graph. They are essential for modelling the intricate relationships often found in real-world data, like fluctuations in the stock market, the growth rate of populations, or the processing of images and language in deep learning.

This non-linear transformation in neural networks is achieved through activation functions like **ReLU**, **sigmoid**, and **tanh**, which you will learn soon. These functions enable neural networks to understand and model complex, non-linear hypotheses that linear models cannot.

By applying these non-linear transformations, neural networks can understand sophisticated data structures and solve a vast range of problems beyond the reach of linear mode.

**But why does non-linearity matter more than linearity in Neural Networks?**

Non-linearity allows the model to learn complex non-linear relationships in large datasets. This ability enables them to perform tasks such as image recognition, natural language processing, etc.

Without non-linearity, neural networks would be limited to solving only simple problems with linear relationships, significantly restricting their applicability and effectiveness in solving real-world problems.

## Type of Activation Function

There are lots of activation functions with their unique characteristic and uses. Still, I will explain some common activation functions used to train the neural network.

Activation functions are categorized based on Linear and Non-Linear activation functions.

### Linear Activation Functions

As you know, input is passed to the activation function to produce desirable or meaningful output. The linear activation function takes the input and returns output increases by a related amount.

They are used because of their simplicity and clear relationship between input and output. It is especially suitable for straightforward models where the relationship between variables is linear.

Mathematically, It is represented as **f(x)=x**. this is also called the **Identity function** or **pass-through function**, which outputs the input as is.

The above picture shows how linear function graphically appears. Linear Function doesn’t help the model learn complex patterns in the dataset, so mostly non-linear functions are used with perceptron.

### Non-Linear Activation Functions

As you know, the non-linear activation function allows the model to have a linear complex relationship between inputs and outputs.

There are different types of non-linear activation functions, which are described below.

#### Sigmoid Activation Function

When this function is used as an activation function in perceptron (neuron), the input passed to this function is converted into a value between 0 and 1. So, it takes an input value, changes it, and returns a new value between 0 and 1.

Mathematically, it uses the formula **f(x) = 1/1+e ^{-x}**.

Let’s understand it with a simple example. Supposing you need to decide if a message is spam or not. You want to assign each message a score; if the score is high, the message will likely be spam.

But here, you want to assign the score so that it should not be too big or too small; I mean, it should be between 0 and 1.

Suppose you pass the number (and call this number **x**) into the sigmoid function, which returns another number, f(x), between 0 and 1. This returned number should be like probability, such as the chance of spam or non-spam.

Think about the mathematical formula of the sigmoid function, which I have mentioned above. Let’s see how it process the input.

- So when you pass the number x to this function, it first makes it negative as -x. Then, it raises a special number called
**e**(the value of e is 2.71828) to the power of -x. By doing this, if the number is small, it becomes large; if it is large, it becomes small. - After that, this function adds 1 to that small or big number, so now we have
**1+e**. Lastly, it takes 1 and divides it by^{-x}**1+e**, which returns the final output, which is f(x).^{-x}

Let me show you the actual number;

- Suppose you pass the number 2 into the sigmoid function.
- Then, the function makes it negative as -2, now raise e to the power of -2, and the value of e is
**2.71828**, so when you do**e**, you get the value around^{-2}**0.135**. **Add 1 to value 0.135**, and you**get 1.135**; now,**take 1 and divide it by 1.135, and you get the value 0.882.**

So when you pass the 2 to sigmoid function, it returns the value 0.882, which is between 0 and 1; this means an 88.2% chance that the message is spam. Generally, this function is used for binary classification problems.

This is how the sigmoid activation function works. Also this function is called the **logistic function**.

#### Hyperbolic Tangent (Tanh) Activation Function

When the tanh (Hyperbolic tangent) activation function is used in perceptron (neuron), the input passed to this function is converted into a value between -1 and 1. So, it takes an input value, changes that, and returns a new value between -1 and 1.

It is used in the hidden layer of artificial neural networks.

Mathematically, it uses the formula, **f(x) = e^{x} – e^{-x}** /

**e**+ e^{x}^{-x}Let’s again take the same example of spam, where you must decide whether the message is spam.

A higher score means more likely to be spam, and a lower means less likely to be spam.

So you pass the number (called it x) into the tanh activation function, and it returns another number, f(x), between -1 and 1.

Let’s see how it processes that number or input value.

- When you pass the number or x to this function, it creates two copies of that number: positive x and -x.
- Then, it raises
**e**to the power of each number. Doing this makes two numbers: one large if x is large and one large if x is small. - After that, it subtracts the negative number from the positive number and adds the positive number to the negative number. So we have
at the top and**e**– e^{x}^{-x}at the bottom.**e**+ e^{x}^{-x} - Lastly, it divides the top number by the bottom number
**(**/**e**– e^{x}^{-x}and returns the output as f(x).**e**+ e^{x}^{-x})

Let’s see with the actual input number and how it works when input to the tanh activation function,

- You pass the number 1 as input into the tanh activation function; i
**t makes two numbers, positive 1 and negative -1.** - Now, this function raises e to the power of each number as
**e**and^{1}**e**, subtracts a negative number from the positive, which gives^{-1}**e**–^{1}**e**, and adds the positive number to the negative number, which provides^{-1}**e**+^{1}**e**.^{-1} - The value of
**e**is^{1}**2.71828**, and**e**is^{-1}**0.357**. The value**e**–^{1}**e**means^{-1}**2.71828 – 0.357**gives 2.3504, and**e**+^{1}**e**means^{-1}**2.71828 + 0.357**gives 3.0862.

Now **divide 2.3504 by 3.0862 to return the value 0.7616**, which says if the message score is 1 and this value is passed to the tanh activation function, then there is a 76.2% chance that the message is spam.

This is how the tanh (hyperbolic tangent) function works internally or processes the input.

#### ReLu (Rectified Linear Unit) Activation Function

When the **ReLu** is used as an activation function in perceptron, it processes the input value and returns the same input value; otherwise, it is 0.

That means if the input value is positive, it returns the same number; if it is not positive or negative, it returns 0.

Mathematically, it uses the formula **f(x) = max(0, x)**

Generally, passing the input value as three into the ReLu activation function checks if the value is larger than 0; if it is, it returns the same number. If you pass the input value as negative, like -3, it returns 0.

This is how the ReLu activation function works when input is passed.

#### Softmax Activation Function

It works on probability; when the **SOFTMAX** activation function is used in perceptron, it converts the given logits (raw scores) into probabilities by taking the exponential of each output and then normalizing these values.

This activation function is used in the output layer of neural networks for multi-class classification tasks. This function takes the input and converts the input value into probabilities.

Mathematically, it uses the formula **f(x _{i}) = e^{xi} / Σ_{i}e^{xj}**.

Let’s see how it processes input values so you can pass list values to the SoftMax activation function.

- Suppose the list of values is x, then for each number in the list x, take the special number e and raise it to the power of that number, creating a list of the new numbers.
- Next, it sums all the new numbers to get the total.
- In the end, for each number in the list x, the function divides the new number by the total to get the probability.

Again, take the example of email, but here, you need to assign a score to the email, whether it is spam, not spam, or uncertain.

So assign scores like:

- If the email is spam, its score is 1.
- if the email is not spam, its score is 0.
- If the email is uncertain, its score is 2

For example, you list numbers **[1, 0, 2]. **When you pass a list of numbers to the Softmax function, it processes the list of numbers in the following ways.

- Each number in the list
**[1, 9, 2]**raises e to the power of that number like this:**e**. Doing this makes a list of new numbers^{1}, e^{0}, e^{2}**[e**.^{1}, e^{0}, e^{2}] - The value of
**e**is^{1}**2.71**,**e**is^{0}**1**and**e**is^{2}**7.389**. This list looks like this**[2.71, 1, 7.389]**. Next, it sums all the numbers**2.71 + 1 + 7.389**and returns the total, which is**11.107**. - For each new number in the list
**[2.71, 1, 7.389]**. Function divides the new number by the total number:**2.71/11.107**,**1/11.107**, and**7.389/11.107**. - You get three values when you divide:
**0.245**,**0.090**, and**0.666**. These three values are probability values for the list of numbers that you pass to the Softmax function.

You can say there is a 24.5% chance that the email is spam, a 9.0% chance that it’s not, and a 66.6% chance that it’s uncertain.

The Softmax function is used for multiclassification, where you need to classify multiple things; it can mainly be used in a neural network where identification of an object in an image or recognition of the words in the given sentence is required.

These are the commonly used activation functions, and there are others, but it is enough to give you an overview of how the activation function works.

## Where to use Activation Functions in Neural Network

So, as you know, neural networks consist of three layers: the input layer, the hidden layer, and the output layer. Each layer consists of a perceptron (neuron). This perceptron has activation functions.

So, choosing what kind of activation function the perceptron should use in each layer depends on the specific task, the dataset’s characteristics, and the problem being solved.

But here, I will explain a general approach to where to use the activation function in neural networks.

**Input Layer:** In the input layer, you can use **ReLu**, **Tanh**, and **Sigmoid**, but in general, this layer doesn’t contain the activation function; it just passes the input to the next layer.

**Hidden Layer:** Use ReLu, Tanh, and Sigmoid; you must use the activation function here; the real learning happens in the hidden layer.

**Output Layer:** Use Sigmoid, Softmax, and Linear. It depends on what kind of task the neural network is for. For binary classification, use Sigmoid; for multi-classification, use Softmax; for regression, use Linear.

Now, you know where to use the activation function in the neural network.

## Building Model with Tensorflow Activation Functions

Let’s build a neural network model with different activation functions to clarify your concept. So here you learn to build a model that can classify whether email is spam based on the different activation functions.

Remember, this is just an example of how to use the activation function in tensorflow.

First, import the required library using the query below.

```
import tensorflow as tf
import numpy as np
```

Create a training and validation dataset using the below code.

```
x_train = np.random.rand(1000, 20)
y_train = np.random.randint(2, size=(1000,))
x_val = np.random.rand(200, 20)
y_val = np.random.randint(2, size=(200,))
```

After importing and creating datasets, create an email spam classifier model based on the different activation functions.

### Email Spam Classification with Sigmoid Tensorflow Activation Function

Define the model using the following code.

```
model_sigmoid = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='sigmoid', input_shape=(x_train.shape[1],)),
tf.keras.layers.Dense(32, activation='sigmoid'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
```

The above code creates a sequential layer model with an input, hidden, and output layer; the activation function for each layer is sigmoid.

Compile the model.

`model_sigmoid.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])`

Fit the model or Train the model.

`model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_val, y_val))`

You created the model containing the sigmoid activation function in the input, hidden and output layers and trained the model. Next, build a model with the tanh activation function.

### Email Spam Classification with Tanh Tensorflow Activation Function

Create a model with the tanh activation function, as shown below.

```
model_tanh = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='tanh', input_shape=(x_train.shape[1],)),
tf.keras.layers.Dense(32, activation='tanh'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
```

In the above, defining the model with input and hidden layer with tanh activation function and the output layer with sigmoid, as I told the output activation function depends on the problem the model is solving.

Compile the model.

`model_tanh.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])`

Train the model.

`model_tanh.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_val, y_val))`

From the above output, you created the model using the tanh activation functions and the different loss functions, **binary_crossentrophy**. Next, let’s see how to build a model with the ReLu activation function.

### Email Spam Classification with ReLu Tensorflow Activation Function

Create a model with the ReLu activation function, as shown below.

```
model_relu = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(x_train.shape[1],)),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')])
```

In the above code, define the input and hidden layer model with the relu activation function.

Compile the model.

`model_relu.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])`

Train the model.

`model_relu.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_val, y_val))`

You have created the model with relu activation and trained it. Next, build a model with a softmax function.

### Email Spam Classification with Softmax Tensorflow Activation Function

Create a multi-classification dataset; consider the dataset is for multiplication, just for example.

```
x_train_multi = np.random.rand(1000, 20)
y_train_multi = np.random.randint(10, size=(1000,))
x_val_multi = np.random.rand(200, 20)
y_val_multi = np.random.randint(10, size=(200,))
```

As shown below, create a model with a softmax activation function in the output layer.

```
model_softmax = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(x_train_multi.shape[1],)),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax') ])
```

In the above, creating a model containing an input and hidden layer with a relu activation function and an output layer with a softmax activation function, look at which layer is using the softmax activation function.

Compile the model.

`model_softmax.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])`

Train the model.

`model_softmax.fit(x_train_multi, y_train_multi, epochs=10, batch_size=32, validation_data=(x_val_multi, y_val_multi))`

In the above picture, you can see the model based on the softmax function, here don’t be confused that the input and hidden layer used relu, and the output used the softmax activation function, but in the other above examples, you used in the same way, then how it is model based on softmax.

Here, I am trying to show you how to use the activation function, but some activation functions are mostly used in the input and hidden layers and, most of the time, in the output layer.

I hope that now you have a clear understanding of the activation function and its uses.

## Conclusion

You learned what activation functions are their type and where to use these activation functions in the neural network layer. Also, I am learning the workings of each activation function with mathematical formulas.

You even visually saw how the graph of each activation function appears. Additionally, you build the spam classifier model based on the different activation functions.

You may like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.