Working with TensorFlow often requires converting data between different types, and one of the most common conversions is from string to integer. I’ve encountered this need countless times in my decade of Python development experience.
In real-world machine learning projects, data rarely comes in the perfect format. Sometimes you’ll get text data that needs to be processed numerically.
For example, when working with datasets containing categorical variables like “California”, “Texas”, or “New York”, you’ll need to convert these strings to integers for your model to process them efficiently.
In this article, I’ll walk you through several practical methods to convert strings to integers in TensorFlow.
Methods to Convert String to Int in TensorFlow
Now, I will explain various methods to convert a string to int in TensorFlow.
1: Use tf.strings.to_number()
The simplest way to convert a string to an integer in TensorFlow is to use the tf.strings.to_number() function. This is ideal when your strings represent actual numbers.
import tensorflow as tf
# Create a tensor of strings
string_tensor = tf.constant(["1", "2", "3", "4", "5"])
# Convert strings to integers
int_tensor = tf.strings.to_number(string_tensor, out_type=tf.int32)
print("Original tensor:", string_tensor)
print("Converted tensor:", int_tensor)When you run this code, you’ll get:
Original tensor: tf.Tensor(['1' '2' '3' '4' '5'], shape=(5,), dtype=string)
Converted tensor: tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)You can see the output in the screenshot below.

This method works great for strings that directly represent numbers, but what about other types of strings?
2: Use tf.cast() with tf.strings.to_number()
For more complex conversions, you can combine tf.strings.to_number() with tf.cast(). This is useful when you need more control over the data type.
import tensorflow as tf
# Create a tensor of strings representing numbers
string_tensor = tf.constant(["10.5", "20.7", "30.1"])
# First convert to float
float_tensor = tf.strings.to_number(string_tensor, out_type=tf.float32)
# Then cast to integer
int_tensor = tf.cast(float_tensor, tf.int32)
print("Original tensor:", string_tensor)
print("Float tensor:", float_tensor)
print("Integer tensor:", int_tensor)Output:
Original tensor: tf.Tensor(['10.5' '20.7' '30.1'], shape=(3,), dtype=string)
Float tensor: tf.Tensor([10.5 20.7 30.1], shape=(3,), dtype=float32)
Integer tensor: tf.Tensor([10 20 30], shape=(3,), dtype=int32)You can see the output in the screenshot below.

This method is particularly useful when dealing with floating-point numbers stored as strings.
Read Iterate Over Tensor In TensorFlow
3: Handle Categorical Data with tf.lookup.StaticHashTable
When working with categorical data like U.S. state names, we often need to map them to integers. Here’s how to do it:
import tensorflow as tf
# Create a tensor of U.S. states
states = tf.constant(["California", "Texas", "New York", "Florida"])
# Create a mapping dictionary
state_mapping = {
"California": 0,
"Texas": 1,
"New York": 2,
"Florida": 3
}
# Convert the mapping to TensorFlow keys and values
keys = tf.constant(list(state_mapping.keys()))
values = tf.constant(list(state_mapping.values()), dtype=tf.int32)
# Create a hash table
table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(keys, values),
default_value=-1)
# Look up the values
state_indices = table.lookup(states)
print("Original states:", states)
print("Encoded indices:", state_indices)Output:
Original states: tf.Tensor(['California' 'Texas' 'New York' 'Florida'], shape=(4,), dtype=string)
Encoded indices: tf.Tensor([0 1 2 3], shape=(4,), dtype=int32)You can see the output in the screenshot below.

This method is excellent for categorical data processing in real machine learning pipelines.
Check out Convert Tensor to Numpy in TensorFlow
4: Use tf.keras.layers.StringLookup for Preprocessing
If you’re building a model using Keras, the StringLookup layer provides a convenient way to convert strings to integers:
import tensorflow as tf
from tensorflow.keras.layers import StringLookup
# Sample dataset of U.S. cities
cities = tf.constant(["Chicago", "New York", "Los Angeles", "Miami", "Chicago", "New York"])
# Create a StringLookup layer
lookup_layer = StringLookup(output_mode='int')
# Adapt the layer to the data
lookup_layer.adapt(cities)
# Convert strings to integers
city_indices = lookup_layer(cities)
print("Original cities:", cities)
print("Encoded indices:", city_indices)
print("Vocabulary:", lookup_layer.get_vocabulary())Output:
Original cities: tf.Tensor(['Chicago' 'New York' 'Los Angeles' 'Miami' 'Chicago' 'New York'], shape=(6,), dtype=string)
Encoded indices: tf.Tensor([1 3 2 4 1 3], shape=(6,), dtype=int64)
Vocabulary: ['', 'Chicago', 'Los Angeles', 'New York', 'Miami']The StringLookup layer automatically creates a vocabulary and assigns indices to each unique string. The empty string ” at index 0 is reserved for out-of-vocabulary or missing values.
Read TensorFlow One_Hot Encoding
5: Convert String Digits in a Ragged Tensor
Sometimes you might have ragged tensors (tensors with varying dimensions) containing string digits that need conversion:
import tensorflow as tf
# Create a ragged tensor of string digits
ragged_strings = tf.ragged.constant([["1", "2", "3"], ["4", "5"], ["6"]])
# Define a function to convert strings to integers
def convert_strings_to_ints(string_tensor):
return tf.strings.to_number(string_tensor, out_type=tf.int32)
# Apply the function to each element
ragged_ints = tf.ragged.map_flat_values(convert_strings_to_ints, ragged_strings)
print("Original ragged tensor:", ragged_strings)
print("Converted ragged tensor:", ragged_ints)Output:
Original ragged tensor: <tf.RaggedTensor [[b'1', b'2', b'3'], [b'4', b'5'], [b'6']]>
Converted ragged tensor: <tf.RaggedTensor [[1, 2, 3], [4, 5], [6]]>This is particularly useful when processing text data with varying lengths, such as sentences or documents.
Handle Errors During Conversion
In real-world scenarios, you’ll often encounter invalid inputs. Here’s how to handle them gracefully:
import tensorflow as tf
# Create a tensor with both valid and invalid string numbers
mixed_strings = tf.constant(["1", "2", "three", "4", "five"])
# Define a function to safely convert strings to integers
def safe_convert(s):
# Try to convert to number
result = tf.strings.to_number(s, out_type=tf.int32, name=None)
# If it fails (will raise InvalidArgumentError), return -1
return result
# Use tf.py_function for custom Python logic with TensorFlow tensors
def convert_with_default(string_tensor):
def process_element(s):
try:
return tf.strings.to_number(s, out_type=tf.int32)
except tf.errors.InvalidArgumentError:
return tf.constant(-1, dtype=tf.int32)
return tf.map_fn(
lambda x: tf.py_function(process_element, [x], tf.int32),
string_tensor,
dtype=tf.int32
)
# Apply the conversion
result = convert_with_default(mixed_strings)
print("Original tensor:", mixed_strings)
print("Converted tensor with defaults:", result)While this approach works, it’s worth noting that using tf.py_function can impact performance in a production environment.
Read Binary Cross-Entropy TensorFlow
Real-World Example: Process U.S. Census Data
Let’s look at a more comprehensive example processing U.S. census data:
import tensorflow as tf
# Sample U.S. census data (simplified)
# Format: [State, Population (as string), Year, Income Level (as string)]
census_data = tf.constant([
["California", "39500000", "2020", "High"],
["Texas", "29000000", "2020", "Medium"],
["New York", "19500000", "2020", "High"],
["Florida", "21500000", "2020", "Medium"],
["Illinois", "12700000", "2020", "Medium"]
])
# Extract columns
states = census_data[:, 0]
populations = census_data[:, 1]
years = census_data[:, 2]
income_levels = census_data[:, 3]
# Convert population strings to integers
population_ints = tf.strings.to_number(populations, out_type=tf.int32)
# Convert income levels to integers using lookup
income_mapping = {
"Low": 0,
"Medium": 1,
"High": 2
}
# Create lookup table
keys = tf.constant(list(income_mapping.keys()))
values = tf.constant(list(income_mapping.values()), dtype=tf.int32)
table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(keys, values),
default_value=-1
)
# Convert income levels to integers
income_ints = table.lookup(income_levels)
print("States:", states)
print("Population (integers):", population_ints)
print("Income levels (encoded):", income_ints)This example demonstrates how you might process multiple string columns in a real dataset, converting them to appropriate integer representations.
Performance Considerations
When working with large datasets, performance becomes critical. Here are some tips:
- Use TensorFlow’s built-in operations whenever possible
- Avoid Python loops and prefer vectorized operations
- Consider using
tf.data.Datasetfor efficient data pipelines - Perform string-to-int conversions as early as possible in your pipeline
Converting strings to integers in TensorFlow is a fundamental skill that’s essential for preprocessing data for machine learning models. The methods I’ve shared come from years of practical experience and should cover most use cases you’ll encounter.
Remember that the best approach depends on your specific needs. For simple numeric strings, tf.strings.to_number() works great. For categorical data, consider using lookup tables or the StringLookup layer.
I hope you found this guide helpful!
Other Python TensorFlow articles you may also like:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.