TensorFlow One_Hot Encoding

In this TensorFlow tutorial, I will show you how to use the Tensorflow one_hot encoding function, tf.one_hot().

This function is beneficial; it changes the data into numerical values that the ML algorithm can work on effectively.

I was creating an image classifier model in Tensorflow, and the task was to classify 10 different kinds of animals. So, I collected data and prepared it, but it wasn’t in the proper format for the ML model, which means it was still in textual form, which ML couldn’t understand properly.

The problem was converting the labels for each animal into a numerical format so that the data could be fed to the ML model. Luckily, I found the function tf.one_hot encode in TensorFlow, which helped me convert these labels into numerical values.

So, in this tutorial, I have explained how to use the tf.one_hot with syntax and easy examples from scratch.

What does One_Hot Encoding mean?

One-hot encoding converts categorical data into a numerical format, and this numerical data is fed to a machine-learning model.

But how does this one-hot encoding work? Suppose you have categorical data or two categories of data. Using one-hot encoding, each category is represented in a binary vector.

Let me show you an example. You have two categories: elephant and eagle. If the elephant has a first position and the dog has a second position, you can represent these elements (categories) in the list as [‘elephant’, ‘eagle’].

Now, in hot-encoding, this list must be a binary vector, so for the elephant, it would be [1, 0], and for the eagle, it would be [0, 1].

Here, elephant = [1, 0], and eagle = [0, 1]; thus, you have converted your categorical data (‘elephant’ and ‘eagle’) into numerical data ([1, 0], [0, 1]). This is how one-hot encoding works because you have encoded the categorical data into numerical values here.

READ:  Python Django round to two decimal places

Now, you can work smoothly on these numerical data in Python.

TensorFlow One_Hot Encoding

But what will happen if you have more than 2 categorical data, such as 5, 10, and 50? Will you encode those categorical data into numerical data manually? I don’t think so.

For that, TensorFlow provides a function called tf.one_hot that you can use to convert categorical data into numerical values like the above.

The tf.one_hot function requires two things: the indices of categorical data and the depth (number of categories). It then returns a binary matrix encoded with input data.

The syntax of tf.one_hot is given below.

tf.one_hot(
    indices,
    depth,
    on_value=None,
    off_value=None,
    axis=None,
    dtype=None,
    name=None
)

Where parameters are:

  • indices: This parameter indicates the index number we want to operate and is a tensor of indices.
  • depth: This defines the dimension of a hot tensor number of rows and columns.
  • on_value: By default, it takes 1 value if not provided.
  • off_value: By default, it takes a 0 value if not provided.
  • axis: This parameter defines the axis to fill; by default, its value is -1.
  • dtype: The data type of the output tensor.

For example, suppose you have the same categories, elephant and eagle.

The first step is to import the TensorFlow using the below code.

import tensorflow as tf

As you know, the tf.one_hot accepts the indices of categorical data; let’s create indices of categorical data. Suppose the index of the elephant is 0 and the eagle is 1, as shown below.

# elephant (0) and eagle (1)
category_indices = [0, 1]

Also, the tf.one_hot accepts the second parameter depth (which number of categories), so consider the depth of one hot dimension equal to 2.

depth = 2

Now pass these two values, category_indices and depth, to the tf.one_hot() function, as shown below.

encoded_data = tf.one_hot(category_indices, depth)

Output the values of encoded_data.

print(encoded_data.numpy())
TensorFlow One_Hot Encoding

It outputs the binary matrix output, which contains two lists [1. 0.] representing elephant and [0. 1.] representing eagle in numerical form.

READ:  How to Install Django on Different Platform

There are two terms, hot and cold, so if you look at the binary matrix, it contains two separate lists:

  • Where the elephant is represented as [1. 0.] because the first position is for the elephant in the indices, 1 shows that it is an elephant, and the eagle is represented as [0. 1.] because the second position is for the eagle in the indices, 1 at the second position in the list shows the eagle.
  • So, in each list, only one position is ‘hot’, marked as 1, and the rest are ‘cold’, marked as 0.

This is how you can use TensorFlow’s tf.one_hot() function to convert the categorical data into binary matrix (numerical data).

But why must we convert categories into numerical values using the tensorflow tf.one_hot() function?

There is some reason: as you know, computers can work more smoothly or effectively with numerical data than with textual or descriptive data. So, one-hot coding techniques convert text into a numerical form, which computers can process and analyze easily.

The second reason is to avoid misinterpretation of data. suppose you assign numbers such as 1 for the elephant and 2 for the eagle. When the computer directly works on this data, it might consider that the eagle is greater than an elephant or the elephant is smaller than an eagle.

So here, one-hot encoding prevents this by treating each category equally.

Third, your data must be compatible with a machine learning model; in Tensorflow, some neural network models work better with numerical data. So here, tf.one_hot allows us to convert the data into a format these neural network models can work with very easily.

Next, I will explain how tf.one_hot is used in Natural Language Processing.

READ:  Python Turtle Hide with examples

TensorFlow One_Hot Encoding in Natural Language Processing

You can use TensorFlow, a one-hot encoding in natural language processing, to represent the words or characters. For example, suppose you have a text and need to encode each character for a neural network model.

You can use the TensorFlow tf.one_hot to encode those characters. Let’s say you have characters n, l, p.

So, the first step is to create indices of these characters and specify their depth (the number of characters in this case).

# n is (0), l is (1), p is (2)

char_indices = [0, 1, 2]
depth_numchars = 3

Now pass the above char_indices and depth_numchars to tf.one_hot() function as shown below.

char_encoding = tf.one_hot(char_indices, depth_numchars)
print(char_encoding)
TensorFlow One_Hot Encoding in Natural Language Processing

It converted the given characters into numerical data such as n = [1. 0. 0.], l=[0. 1. 0.] and p=[0. 0. 1.], look in the output.

This is just a simple example of using the tf.one_hot() to encode the characters into numerical data in Natural Language Processing.

After learning from the above example and the concepts of the tf.one_hot() function, I hope you understand how to use the TensorFlow one-hot encoding method.

Conclusion

In this TensorFlow tutorial, you learned how to use the Tensorflow one_hot encoding function, tf.one_hot(), to convert the given categorical or textual data into numerical data.

You learned about one-hot encoding and how it works with an example. You also did an example where you converted the two categories ‘elephant’ and ‘eagle’ into numerical form.

You may like to read: