TensorFlow Convert String To Int

Working with TensorFlow often requires converting data between different types, and one of the most common conversions is from string to integer. I’ve encountered this need countless times in my decade of Python development experience.

In real-world machine learning projects, data rarely comes in the perfect format. Sometimes you’ll get text data that needs to be processed numerically.

For example, when working with datasets containing categorical variables like “California”, “Texas”, or “New York”, you’ll need to convert these strings to integers for your model to process them efficiently.

In this article, I’ll walk you through several practical methods to convert strings to integers in TensorFlow.

This Tutorial Covers:

Methods to Convert String to Int in TensorFlow

Now, I will explain various methods to convert a string to int in TensorFlow.

1: Use tf.strings.to_number()

The simplest way to convert a string to an integer in TensorFlow is to use the tf.strings.to_number() function. This is ideal when your strings represent actual numbers.

import tensorflow as tf

# Create a tensor of strings
string_tensor = tf.constant(["1", "2", "3", "4", "5"])

# Convert strings to integers
int_tensor = tf.strings.to_number(string_tensor, out_type=tf.int32)

print("Original tensor:", string_tensor)
print("Converted tensor:", int_tensor)

When you run this code, you’ll get:

Original tensor: tf.Tensor(['1' '2' '3' '4' '5'], shape=(5,), dtype=string)
Converted tensor: tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

You can see the output in the screenshot below.

This method works great for strings that directly represent numbers, but what about other types of strings?

2: Use tf.cast() with tf.strings.to_number()

For more complex conversions, you can combine tf.strings.to_number() with tf.cast(). This is useful when you need more control over the data type.

import tensorflow as tf

# Create a tensor of strings representing numbers
string_tensor = tf.constant(["10.5", "20.7", "30.1"])

# First convert to float
float_tensor = tf.strings.to_number(string_tensor, out_type=tf.float32)

# Then cast to integer
int_tensor = tf.cast(float_tensor, tf.int32)

print("Original tensor:", string_tensor)
print("Float tensor:", float_tensor)
print("Integer tensor:", int_tensor)

Output:

Original tensor: tf.Tensor(['10.5' '20.7' '30.1'], shape=(3,), dtype=string)
Float tensor: tf.Tensor([10.5 20.7 30.1], shape=(3,), dtype=float32)
Integer tensor: tf.Tensor([10 20 30], shape=(3,), dtype=int32)

You can see the output in the screenshot below.

This method is particularly useful when dealing with floating-point numbers stored as strings.

Read Iterate Over Tensor In TensorFlow

3: Handle Categorical Data with tf.lookup.StaticHashTable

When working with categorical data like U.S. state names, we often need to map them to integers. Here’s how to do it:

import tensorflow as tf

# Create a tensor of U.S. states
states = tf.constant(["California", "Texas", "New York", "Florida"])

# Create a mapping dictionary
state_mapping = {
    "California": 0,
    "Texas": 1, 
    "New York": 2,
    "Florida": 3
}

# Convert the mapping to TensorFlow keys and values
keys = tf.constant(list(state_mapping.keys()))
values = tf.constant(list(state_mapping.values()), dtype=tf.int32)

# Create a hash table
table = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer(keys, values),
    default_value=-1)

# Look up the values
state_indices = table.lookup(states)

print("Original states:", states)
print("Encoded indices:", state_indices)

Output:

Original states: tf.Tensor(['California' 'Texas' 'New York' 'Florida'], shape=(4,), dtype=string)
Encoded indices: tf.Tensor([0 1 2 3], shape=(4,), dtype=int32)

You can see the output in the screenshot below.

This method is excellent for categorical data processing in real machine learning pipelines.

Check out Convert Tensor to Numpy in TensorFlow

4: Use tf.keras.layers.StringLookup for Preprocessing

If you’re building a model using Keras, the StringLookup layer provides a convenient way to convert strings to integers:

import tensorflow as tf
from tensorflow.keras.layers import StringLookup

# Sample dataset of U.S. cities
cities = tf.constant(["Chicago", "New York", "Los Angeles", "Miami", "Chicago", "New York"])

# Create a StringLookup layer
lookup_layer = StringLookup(output_mode='int')

# Adapt the layer to the data
lookup_layer.adapt(cities)

# Convert strings to integers
city_indices = lookup_layer(cities)

print("Original cities:", cities)
print("Encoded indices:", city_indices)
print("Vocabulary:", lookup_layer.get_vocabulary())

Output:

Original cities: tf.Tensor(['Chicago' 'New York' 'Los Angeles' 'Miami' 'Chicago' 'New York'], shape=(6,), dtype=string)
Encoded indices: tf.Tensor([1 3 2 4 1 3], shape=(6,), dtype=int64)
Vocabulary: ['', 'Chicago', 'Los Angeles', 'New York', 'Miami']

The StringLookup layer automatically creates a vocabulary and assigns indices to each unique string. The empty string ” at index 0 is reserved for out-of-vocabulary or missing values.

Read TensorFlow One_Hot Encoding

5: Convert String Digits in a Ragged Tensor

Sometimes you might have ragged tensors (tensors with varying dimensions) containing string digits that need conversion:

import tensorflow as tf

# Create a ragged tensor of string digits
ragged_strings = tf.ragged.constant([["1", "2", "3"], ["4", "5"], ["6"]])

# Define a function to convert strings to integers
def convert_strings_to_ints(string_tensor):
    return tf.strings.to_number(string_tensor, out_type=tf.int32)

# Apply the function to each element
ragged_ints = tf.ragged.map_flat_values(convert_strings_to_ints, ragged_strings)

print("Original ragged tensor:", ragged_strings)
print("Converted ragged tensor:", ragged_ints)

Output:

Original ragged tensor: <tf.RaggedTensor [[b'1', b'2', b'3'], [b'4', b'5'], [b'6']]>
Converted ragged tensor: <tf.RaggedTensor [[1, 2, 3], [4, 5], [6]]>

This is particularly useful when processing text data with varying lengths, such as sentences or documents.

Handle Errors During Conversion

In real-world scenarios, you’ll often encounter invalid inputs. Here’s how to handle them gracefully:

import tensorflow as tf

# Create a tensor with both valid and invalid string numbers
mixed_strings = tf.constant(["1", "2", "three", "4", "five"])

# Define a function to safely convert strings to integers
def safe_convert(s):
    # Try to convert to number
    result = tf.strings.to_number(s, out_type=tf.int32, name=None)
    # If it fails (will raise InvalidArgumentError), return -1
    return result

# Use tf.py_function for custom Python logic with TensorFlow tensors
def convert_with_default(string_tensor):
    def process_element(s):
        try:
            return tf.strings.to_number(s, out_type=tf.int32)
        except tf.errors.InvalidArgumentError:
            return tf.constant(-1, dtype=tf.int32)

    return tf.map_fn(
        lambda x: tf.py_function(process_element, [x], tf.int32),
        string_tensor,
        dtype=tf.int32
    )

# Apply the conversion
result = convert_with_default(mixed_strings)

print("Original tensor:", mixed_strings)
print("Converted tensor with defaults:", result)

While this approach works, it’s worth noting that using tf.py_function can impact performance in a production environment.

Read Binary Cross-Entropy TensorFlow

Real-World Example: Process U.S. Census Data

Let’s look at a more comprehensive example processing U.S. census data:

import tensorflow as tf

# Sample U.S. census data (simplified)
# Format: [State, Population (as string), Year, Income Level (as string)]
census_data = tf.constant([
    ["California", "39500000", "2020", "High"],
    ["Texas", "29000000", "2020", "Medium"],
    ["New York", "19500000", "2020", "High"],
    ["Florida", "21500000", "2020", "Medium"],
    ["Illinois", "12700000", "2020", "Medium"]
])

# Extract columns
states = census_data[:, 0]
populations = census_data[:, 1]
years = census_data[:, 2]
income_levels = census_data[:, 3]

# Convert population strings to integers
population_ints = tf.strings.to_number(populations, out_type=tf.int32)

# Convert income levels to integers using lookup
income_mapping = {
    "Low": 0,
    "Medium": 1,
    "High": 2
}

# Create lookup table
keys = tf.constant(list(income_mapping.keys()))
values = tf.constant(list(income_mapping.values()), dtype=tf.int32)

table = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer(keys, values),
    default_value=-1
)

# Convert income levels to integers
income_ints = table.lookup(income_levels)

print("States:", states)
print("Population (integers):", population_ints)
print("Income levels (encoded):", income_ints)

This example demonstrates how you might process multiple string columns in a real dataset, converting them to appropriate integer representations.

Performance Considerations

When working with large datasets, performance becomes critical. Here are some tips:

Use TensorFlow’s built-in operations whenever possible
Avoid Python loops and prefer vectorized operations
Consider using tf.data.Dataset for efficient data pipelines
Perform string-to-int conversions as early as possible in your pipeline

Converting strings to integers in TensorFlow is a fundamental skill that’s essential for preprocessing data for machine learning models. The methods I’ve shared come from years of practical experience and should cover most use cases you’ll encounter.

Remember that the best approach depends on your specific needs. For simple numeric strings, tf.strings.to_number() works great. For categorical data, consider using lookup tables or the StringLookup layer.

I hope you found this guide helpful!

Other Python TensorFlow articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

TensorFlow Convert String to Int

Methods to Convert String to Int in TensorFlow

1: Use tf.strings.to_number()

2: Use tf.cast() with tf.strings.to_number()

3: Handle Categorical Data with tf.lookup.StaticHashTable

4: Use tf.keras.layers.StringLookup for Preprocessing

5: Convert String Digits in a Ragged Tensor

Handle Errors During Conversion

Real-World Example: Process U.S. Census Data

Performance Considerations

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends