How to Remove Unicode Characters in Python

Recently, I was working on a data-cleaning project where I had to process customer feedback collected from different regions of the USA.

The challenge? Many of the text files contained special Unicode characters like emojis, accented letters, and symbols that I didn’t need in my analysis.

I quickly realized that removing Unicode characters in Python is not as easy as it sounds. Over the years, I’ve tried different approaches, and in this tutorial, I’ll share the most reliable methods I use.

This Tutorial Covers:

Method 1 – Use encode() and decode() in Python

One of the simplest ways I remove unwanted Unicode characters is by encoding the string into ASCII and then decoding it back.

Here’s how I do it:

# Method 1: Using encode() and decode()

text = "Python is easy to learn 😊 but sometimes tricky ñ!"
print("Original:", text)

# Encode to ASCII and ignore errors
clean_text = text.encode("ascii", "ignore").decode("ascii")

print("Cleaned:", clean_text)

Output:

Original: Python is easy to learn 😊 but sometimes tricky ñ!
Cleaned: Python is easy to learn  but sometimes tricky !

You can see the output in the screenshot below.

python remove unicode characters from string

This method works great when I just want plain ASCII text and don’t care about losing accented letters or emojis.

Method 2 – Use Regular Expressions (re)

Sometimes, I need more control, especially when I want to keep certain characters but remove others. That’s where regex comes in handy.

# Method 2: Using regex to remove non-ASCII characters
import re

text = "Café prices went up by 5% ☕ in New York!"
print("Original:", text)

clean_text = re.sub(r'[^\x00-\x7F]+', '', text)

print("Cleaned:", clean_text)

Output:

Original: Café prices went up by 5% ☕ in New York!
Cleaned: Caf prices went up by 5%  in New York!

You can see the output in the screenshot below.

I often use this when cleaning survey data from US customers, where accented characters sneak in.

Method 3 – Use Python’s str.translate() Method

Another neat trick I use is Python’s translate() method. It allows me to map characters I don’t want to None.

# Method 3: Using str.translate()
import string

text = "Order #123 – shipped to São Paulo 🚚"
print("Original:", text)

# Create a mapping table to remove non-ASCII characters
clean_text = text.translate({ord(c): None for c in text if ord(c) > 127})

print("Cleaned:", clean_text)

Output:

Original: Order #123 – shipped to São Paulo 🚚
Cleaned: Order #123  shipped to So Paulo

You can see the output in the screenshot below.

This method gives me flexibility when I want to filter out specific ranges of characters.

Method 4 – Use Python unicodedata Module

If I want to normalize text (for example, keep “é” but convert it into “e”), I use the unicodedata module.

# Method 4: Using unicodedata to normalize text
import unicodedata

text = "Résumé submitted for job in San José"
print("Original:", text)

# Normalize and remove non-ASCII
normalized_text = unicodedata.normalize("NFKD", text).encode("ascii", "ignore").decode("ascii")

print("Cleaned:", normalized_text)

Output:

Original: Résumé submitted for job in San José
Cleaned: Resume submitted for job in San Jose

This is my go-to method when I want to preserve readability while still cleaning up Unicode.

Which Method Should You Use?

Use encode/decode if you want a quick and simple cleanup.
Use regex if you need fine-grained control.
Use translate() if you want to filter based on ASCII ranges.
Use unicodedata if you want to normalize accented characters into plain text.

Most of the time, I use unicodedata when cleaning data for reports, because it keeps words readable while removing unwanted Unicode. But if I’m preprocessing data for machine learning, regex is usually my first choice.

So, that’s how I remove Unicode characters in Python. Try these methods in your own projects, and you’ll see how much easier text processing becomes.

You may read other Python tutorials:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/