When working with real-world data, I often encounter messy text containing a mix of numbers and other characters. Sometimes, I need to extract the numeric values from these strings for calculations or analysis.
Pandas makes this data cleaning process much easier, but there’s no single built-in function called “remove_non_numeric()”. Instead, we need to use a combination of methods to get the job done.
In this tutorial, I’ll share four different methods for removing all non-numeric characters in Pandas, along with practical examples from my decade of Python experience.
Methods to Remove All Non-numeric Characters in Pandas
Now, I will explain how to remove all non-numeric characters in Pandas.
Method 1: Use str.replace() with Regular Expressions
The simplest way to remove non-numeric characters is to use Pandas’ string method str.replace() with a regular expression pattern.
Here’s how you can do it:
import pandas as pd
# Sample DataFrame with mixed string data
df = pd.DataFrame({
'Product_ID': ['ABC123', 'DEF456', 'GHI789'],
'Price': ['$99.99', '€49.95', '£29.99'],
'Phone': ['(555) 123-4567', '555.987.6543', '555-321-7890']
})
# Remove non-numeric characters from the Phone column
df['Phone_Clean'] = df['Phone'].str.replace(r'\D', '', regex=True)
print(df)Output:
Product_ID Price Phone Phone_Clean
0 ABC123 $99.99 (555) 123-4567 5551234567
1 DEF456 €49.95 555.987.6543 5559876543
2 GHI789 £29.99 555-321-7890 5553217890I executed the above example code and added the screenshot below.

In this example, I used the pattern \D which matches any non-digit character, and replaces all matches with an empty string.
This method is simple and works well for most cases. The regex=True parameter is important to ensure the pattern is interpreted as a regular expression.
Method 2: Use Lambda Function with filter()
Another approach is to use a lambda function with Python filter() function to keep only the numeric characters:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Order_ID': ['ORD-12345', 'ORD-67890', 'ORD-24680'],
'Amount': ['$1,234.56', '$789.01', '$2,468.10'],
})
# Remove non-numeric characters using lambda and filter
df['Order_ID_Clean'] = df['Order_ID'].apply(lambda x: ''.join(filter(str.isdigit, x)))
df['Amount_Clean'] = df['Amount'].apply(lambda x: ''.join(filter(str.isdigit, x)))
print(df)Output:
Order_ID Amount Order_ID_Clean Amount_Clean
0 ORD-12345 $1,234.56 12345 123456
1 ORD-67890 $789.01 67890 78901
2 ORD-24680 $2,468.10 24680 246810I executed the above example code and added the screenshot below.

This method uses Python’s built-in filter() function along with str.isdigit() to keep only the digit characters from each string. The join() method then combines these characters back into a string.
Notice that this method removes all non-digit characters, including decimal points and commas. This works great for IDs but may not be suitable for monetary values, where you need to keep the decimal point.
Method 3: Use pd.to_numeric() with errors=’coerce’
If your goal is to convert strings to actual numeric values (not just strings containing only digits), Pandas provides a convenient function called to_numeric():
import pandas as pd
# Sample DataFrame with numeric values as strings
df = pd.DataFrame({
'Sales': ['$1,200', '$3,450', 'N/A', '$890'],
'Units': ['120 pcs', '345 pcs', 'Out of stock', '89 pcs']
})
# First remove currency symbols, commas, and other characters
df['Sales_Clean'] = df['Sales'].str.replace(r'[^\d.]', '', regex=True)
df['Units_Clean'] = df['Units'].str.replace(r'\D', '', regex=True)
# Convert to actual numeric values
df['Sales_Numeric'] = pd.to_numeric(df['Sales_Clean'], errors='coerce')
df['Units_Numeric'] = pd.to_numeric(df['Units_Clean'], errors='coerce')
print(df)Output:
Sales Units Sales_Clean Units_Clean Sales_Numeric Units_Numeric
0 $1,200 120 pcs 1200 120 1200.0 120.0
1 $3,450 345 pcs 3450 345 3450.0 345.0
2 N/A Out of stock NaN NaN
3 $890 89 pcs 890 89 890.0 89.0I executed the above example code and added the screenshot below.

This approach is particularly useful when you want to perform mathematical operations on the cleaned data. The errors='coerce' parameter converts any invalid numeric strings to NaN values rather than raising an error.
I’m using two different regex patterns here:
[^\d.]matches any character that’s not a digit or decimal point\Dmatches any non-digit character
The first pattern is better for monetary values, where you want to keep decimal points, while the second is better for whole numbers.
Method 4: Use str.extract() to Pull Out Numeric Portions
If you need to extract specific numeric patterns from strings, Python str.extract() method is very useful:
import pandas as pd
# Sample DataFrame with product codes and measurements
df = pd.DataFrame({
'Product': ['iPhone 13 Pro', 'Samsung Galaxy S22', 'Google Pixel 6'],
'Dimensions': ['146.7 x 71.5 x 7.65 mm', '146.0 x 70.6 x 7.6 mm', '158.6 x 74.8 x 8.9 mm']
})
# Extract the first number from each product name
df['Product_Number'] = df['Product'].str.extract(r'(\d+)')
# Extract all three dimensions separately
df[['Height', 'Width', 'Thickness']] = df['Dimensions'].str.extract(r'([\d.]+)\s*x\s*([\d.]+)\s*x\s*([\d.]+)')
# Convert the extracted strings to float
df[['Height', 'Width', 'Thickness']] = df[['Height', 'Width', 'Thickness']].astype(float)
print(df)Output:
Product Dimensions Product_Number Height Width Thickness
0 iPhone 13 Pro 146.7 x 71.5 x 7.65 mm 13 146.7 71.5 7.65
1 Samsung Galaxy S22 146.0 x 70.6 x 7.6 mm 22 146.0 70.6 7.60
2 Google Pixel 6 158.6 x 74.8 x 8.9 mm 6 158.6 74.8 8.90This method is particularly useful when you need to extract specific numeric patterns from more complex strings. The regular expression pattern inside extract() uses capture groups (the parts in parentheses) to pull out the exact numbers you want.
The beauty of this approach is that it can handle complex patterns and extract multiple numeric values at once, as shown with the dimensions example.
Read Fix “Function Not Implemented for This Dtype” Error in Python
Handle Special Cases: Negative Numbers and Decimals
When working with financial or scientific data, you might need to preserve negative signs and decimal points:
import pandas as pd
# Sample DataFrame with financial data
df = pd.DataFrame({
'Amount': ['+$1,234.56', '-$789.01', '$2,468.10'],
'Change': ['+15.2%', '-7.8%', '+0.3%']
})
# Preserve negative signs and decimal points
df['Amount_Clean'] = df['Amount'].str.replace(r'[^\d.-]', '', regex=True)
df['Change_Clean'] = df['Change'].str.replace(r'[^\d.-]', '', regex=True)
# Convert to numeric values
df['Amount_Numeric'] = pd.to_numeric(df['Amount_Clean'])
df['Change_Numeric'] = pd.to_numeric(df['Change_Clean'])
print(df)Output:
Amount Change Amount_Clean Change_Clean Amount_Numeric Change_Numeric
0 +$1,234.56 +15.2% 1234.56 15.2 1234.56 15.2
1 -$789.01 -7.8% -789.01 -7.8 -789.01 -7.8
2 $2,468.10 +0.3% 2468.10 0.3 2468.10 0.3In this example, I used the pattern [^\d.-] which preserves digits, decimal points, and minus signs while removing everything else. This is crucial for financial data, where the negative sign carries important meaning.
Check out Read a CSV to the dictionary using Pandas in Python
Performance Considerations for Large DataFrames
When working with large datasets, performance becomes important. Here’s how the different methods stack up:
- str.replace(): Generally fast and efficient for most operations.
- apply() with lambda: Slower for large DataFrames as it applies Python-level functions.
- to_numeric(): Very efficient for converting to numeric types.
- str.extract(): Great for complex patterns, but can be slower than simple replacements.
For a DataFrame with millions of rows, I recommend using vectorized operations like str.replace() or to_numeric() rather than apply() with lambda functions.
In my experience working with US customer data, I’ve found the str.replace() method to be the most versatile for cleaning up phone numbers, zip codes, and social security numbers, where you need to strip out all formatting characters.
All these methods have their place depending on your specific needs. I hope these examples from my years of Python data wrangling help you clean your data more effectively.
You may like to read:
- How to Get Index Values from DataFrames in Pandas Python
- Convert Pandas Dataframe to Tensor Dataset
- Python Dataframe Update Column Value

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.