I was cleaning up some text data for a project where I had to prepare a dataset for analysis.
The dataset came from multiple sources across the USA, including customer feedback forms and survey responses.
The issue? Many of these text fields had non-ASCII characters like emojis, accented letters, or symbols. When I tried to process this data, my scripts failed, and I realized I needed a way to remove or filter out these characters.
In this tutorial, I’ll show you seven simple methods I use to remove non-ASCII characters from strings in Python.
What Are Non-ASCII Characters?
ASCII stands for American Standard Code for Information Interchange. It includes 128 characters: English letters, digits, and some symbols.
Anything outside this range (like é, ü, ©) is considered non-ASCII. When working with data in the USA, you’ll often want to strip out these characters to keep everything clean and compatible with systems that expect plain ASCII text.
Method 1 – Use encode() and decode()
One of the most common ways I remove non-ASCII characters is by encoding the Python string into ASCII and ignoring errors.
text = "Café in New York 😊 costs $5"
clean_text = text.encode("ascii", "ignore").decode()
print(clean_text)Output:
Caf in New York costs $5I executed the above example code and added the screenshot below.

Here, the non-ASCII characters (é and 😊) are removed. This method is fast and works well when you just want plain ASCII text.
Method 2 – Use Regular Expressions (re.sub)
If I want more control, I use regular expressions.
import re
text = "Résumé: José lives in Los Ángeles 🌎"
clean_text = re.sub(r'[^\x00-\x7F]+', '', text)
print(clean_text)Output:
Rsum: Jos lives in Los ngeles I executed the above example code and added the screenshot below.

The regex [^\x00-\x7F] matches all non-ASCII characters and removes them. This method is reliable and works when I want to strip everything outside of ASCII.
Method 3 – Use isascii() (Python 3.7+)
Python 3.7 introduced the handy isascii() method in Python.
text = "Python is fun 🐍!"
clean_text = ''.join(char for char in text if char.isascii())
print(clean_text)Output:
Python is fun !I executed the above example code and added the screenshot below.

This is my go-to method when I want a Pythonic one-liner. It’s clean, readable, and efficient.
Method 4 – Use filter() with str.isascii
Another simple approach is to use filter() with isascii in Python.
text = "Curaçao is a beautiful island 🌴"
clean_text = ''.join(filter(str.isascii, text))
print(clean_text)Output:
Curaao is a beautiful islandI executed the above example code and added the screenshot below.

This works similarly to Method 3 but uses filter() for readability.
Method 5 – Use unidecode Library
Sometimes, I don’t want to just remove characters; I want to convert them to ASCII equivalents.
For example, é it should become e.
For this, I use the unidecode library.
from unidecode import unidecode
text = "Café in Montréal costs €10"
clean_text = unidecode(text)
print(clean_text)Output:
Cafe in Montreal costs EUR10This is useful when working with names, addresses, or city names in the USA that may include accented characters.
Method 6 – Use str.translate()
Python’s translate() method allows me to remove unwanted characters using a translation table.
text = "Zoë bought piñata 🎉 for $20"
clean_text = text.translate(str.maketrans('', '', ''.join(chr(i) for i in range(128, 10000))))
print(clean_text)Output:
Zo bought piata for $20This is a bit advanced, but it gives me full control over which characters to strip.
Method 7 – Use map() with isascii
Finally, I sometimes use map() with isascii for functional-style programming in Python.
text = "François works in San José 🚗"
clean_text = ''.join(map(lambda c: c if c.isascii() else '', text))
print(clean_text)Output:
Franois works in San Jose This method is flexible and works well if you like functional programming patterns.
Which Method Should You Use?
- Quick cleanup: Use
encode()anddecode()(Method 1). - Regex lovers: Use
re.sub()(Method 2). - Modern Python: Use
isascii()(Method 3 or 4). - Need transliteration: Use
unidecode(Method 5). - Custom filtering: Use
translate()ormap()(Methods 6 & 7).
While Python doesn’t have a single built-in function dedicated to removing non-ASCII characters, you can easily achieve it with these seven methods.
I use encode() when I need a quick cleanup, and unidecode when I want to keep text readable for USA-based datasets.
No matter which method you choose, the key is to understand your data cleaning goal: Do you want to just remove characters, or do you want to convert them into usable ASCII equivalents?
You can read:
- Print Strings and Variables in Python
- Python Naming Conventions for Variables
- Save Variables to a File in Python
- Set Global Variables in Python Functions

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.