Recently, I was working on a data-cleaning project where I had to process customer feedback collected from different regions of the USA.
The challenge? Many of the text files contained special Unicode characters like emojis, accented letters, and symbols that I didn’t need in my analysis.
I quickly realized that removing Unicode characters in Python is not as easy as it sounds. Over the years, I’ve tried different approaches, and in this tutorial, I’ll share the most reliable methods I use.
Method 1 – Use encode() and decode() in Python
One of the simplest ways I remove unwanted Unicode characters is by encoding the string into ASCII and then decoding it back.
Here’s how I do it:
# Method 1: Using encode() and decode()
text = "Python is easy to learn 😊 but sometimes tricky ñ!"
print("Original:", text)
# Encode to ASCII and ignore errors
clean_text = text.encode("ascii", "ignore").decode("ascii")
print("Cleaned:", clean_text)Output:
Original: Python is easy to learn 😊 but sometimes tricky ñ!
Cleaned: Python is easy to learn but sometimes tricky !You can see the output in the screenshot below.

This method works great when I just want plain ASCII text and don’t care about losing accented letters or emojis.
Method 2 – Use Regular Expressions (re)
Sometimes, I need more control, especially when I want to keep certain characters but remove others. That’s where regex comes in handy.
# Method 2: Using regex to remove non-ASCII characters
import re
text = "Café prices went up by 5% ☕ in New York!"
print("Original:", text)
clean_text = re.sub(r'[^\x00-\x7F]+', '', text)
print("Cleaned:", clean_text)Output:
Original: Café prices went up by 5% ☕ in New York!
Cleaned: Caf prices went up by 5% in New York!You can see the output in the screenshot below.

I often use this when cleaning survey data from US customers, where accented characters sneak in.
Method 3 – Use Python’s str.translate() Method
Another neat trick I use is Python’s translate() method. It allows me to map characters I don’t want to None.
# Method 3: Using str.translate()
import string
text = "Order #123 – shipped to São Paulo 🚚"
print("Original:", text)
# Create a mapping table to remove non-ASCII characters
clean_text = text.translate({ord(c): None for c in text if ord(c) > 127})
print("Cleaned:", clean_text)Output:
Original: Order #123 – shipped to São Paulo 🚚
Cleaned: Order #123 shipped to So Paulo You can see the output in the screenshot below.

This method gives me flexibility when I want to filter out specific ranges of characters.
Method 4 – Use Python unicodedata Module
If I want to normalize text (for example, keep “é” but convert it into “e”), I use the unicodedata module.
# Method 4: Using unicodedata to normalize text
import unicodedata
text = "Résumé submitted for job in San José"
print("Original:", text)
# Normalize and remove non-ASCII
normalized_text = unicodedata.normalize("NFKD", text).encode("ascii", "ignore").decode("ascii")
print("Cleaned:", normalized_text)Output:
Original: Résumé submitted for job in San José
Cleaned: Resume submitted for job in San JoseThis is my go-to method when I want to preserve readability while still cleaning up Unicode.
Which Method Should You Use?
- Use encode/decode if you want a quick and simple cleanup.
- Use regex if you need fine-grained control.
- Use translate() if you want to filter based on ASCII ranges.
- Use unicodedata if you want to normalize accented characters into plain text.
Most of the time, I use unicodedata when cleaning data for reports, because it keeps words readable while removing unwanted Unicode. But if I’m preprocessing data for machine learning, regex is usually my first choice.
So, that’s how I remove Unicode characters in Python. Try these methods in your own projects, and you’ll see how much easier text processing becomes.
You may read other Python tutorials:
- Call Super Constructors with Arguments in Python
- Use Python Class Constructors with Parameters
- Call a Base Class Constructor with Arguments in Python
- Check if an Object is Iterable in Python

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.