How to Remove All Non-numeric Characters in Pandas [4 Methods]

If you are having trouble removing non-numeric characters in Pandas in Python, let me tell you different methods to remove all non-numeric characters in Pandas.

Non-numeric characters are everything except numbers, like letters, punctuation, symbols, or whitespace. To remove non-numeric characters in Pandas may be helpful in data cleaning and help get the data ready for additional processing or analysis in Python.

In this Python post, we’ll demonstrate how to use various functions and techniques to eliminate every non-numeric character from pandas.

We can use the essential Python methods like replace(), extract(), lambda function, regular expression, etc, to remove all the non-numeric characters from a dataframe column in Python Pandas.

Let’s see all the methods in detail and try to replace all non-numeric characters from the Pandas dataframe in Python.

  1. Using Pandas Series.str.extract() Method
  2. Using re.sub() with apply() Method
  3. Using lambda Function
  4. Using pd.to_numeric function

1. Remove all non-numeric characters in Pandas using series.str.extract() method

The series.str.extract() method is used to extract substrings from a series in the Python Pandas library.

Here, we create a dataframe in Python and want to remove non-numeric characters in Pandas. We can use the str.extract() method by providing a regular expression pattern.

Here is an example:

import pandas as pd

Cities = {
    'Name': ['New York', 'California', 'Texas', 'Florida', 'Illinois'],
    'Zipcode': ['NY10001', 'CA90210', 'TX77001', 'FL33101', 'IL60601']
}
Cities_df = pd.DataFrame(Cities)
Cities_df['Zipcode'] = Cities_df['Zipcode'].str.extract('(\d+)')
print(Cities_df)

Here, we are extracting the numeric characters from a column of a Python Pandas dataframe.

The \d is a shorthand character class used to match numbers only. It is the same as we use the regex as[0-9] in Python, and the + allows us to match one or multiple of the specified expressions in Python.

Cities_df['Zipcode'].str.extract('(\d+)')

Output:

         Name Zipcode
0    New York   10001
1  California   90210
2       Texas   77001
3     Florida   33101
4    Illinois   60601

Upon executing the code in Pycharm, the resulting output is displayed in the screenshot below.

Remove All Non-numeric Characters in Pandas in Python

Note: We can use the astype() to cast Pandas object to the specified data type, so we can convert the column type after removing the non-numeric characters to int in Python.

2. Pandas remove non numeric characters using the re.sub() with apply() method

We can use the re.sub() function from the re module and the apply() method to remove non-numeric characters from Pandas in Python. The re.sub() function extracts the digits from the column in the Pandas dataset.

Here is an example: to drop all the non-numeric values from a Pandas:

import pandas as pd
import re
def remove_non_numberics(s):
    return re.sub('[^0-9]+', '', s)

Employee = {
    'Name': ['Chandler', 'Monica', 'Rachel', 'Joey', 'Ross'],
    'Zipcode': ['NY10001', 'CA90210', 'TX77001', 'FL33101', 'IL60601']
}
Employee_df = pd.DataFrame(Employee)
Employee_df['Zipcode'] = Employee_df['Zipcode'].apply(remove_non_numberics)
print(Employee_df)

Output: The re.sub() function in Python takes three arguments: First was the pattern that needed to be replaced, second was the string with which it would be replaced, and the third was the string on which re.sub() was executed.

re.sub('[^0-9]+', '', s)
       Name Zipcode
0  Chandler   10001
1    Monica   90210
2    Rachel   77001
3      Joey   33101
4      Ross   60601

The screenshot below presents the output after the code was successfully implemented in the Pycharm editor.

pandas remove all non numeric characters in Python

3. Remove non numeric characters in Python Pandas using the lambda function

The lambda function is a single-line function in Python we can combine with different functions.

In Python, we can use the if statement with the isdigit() method to check whether the character is a digit. We can join the characters if they are digits using the .join() method.

We will use the map() method to map new values returned from the lambda function with the old values of the column in the dataframe.

Here is an instance to remove all non-numeric characters in Pandas:

import pandas as pd

Employee = {
    'Name': ['Chandler', 'Monica', 'Rachel'],
    'Phone_number': ['(123) 456-7890', '(555) 987-6543', '(111) 222-3333']
}
Employee_df = pd.DataFrame(Employee)
Employee_df['Phone_number'] = Employee_df['Phone_number'].map(lambda i: ''.join([x for x in i if x.isdigit()]))
print(Employee_df)

Output:

       Name Phone_number
0  Chandler   1234567890
1    Monica   5559876543
2    Rachel   1112223333

The screenshot below features the output after executing the Pycharm editor’s code.

pandas remove non numeric characters from column in Python

4. How to remove non-numeric data from a Pandas dataframe using pd.to_numeric function

The pd.to_numeric function from Python Pandas can convert string or object type data to numeric types, returning NaN if the conversion is impossible. We can then use the dropna method to filter out rows containing NaN.

Here is an example to remove non-numeric characters from a column in Pandas dataframe in Python:

import pandas as pd

Sales_report = pd.DataFrame({
    "Products": ["Kitkat", "Bounty", "Twix", "Mars", "Lindit"],
    "sales": ["150.00", "5000", "-", "1000", "10%"]
})
Sales_report["sales"] = pd.to_numeric(Sales_report["sales"], errors="coerce")
Sales_report.dropna(subset=["sales"], inplace=True)
print(Sales_report)

Output:

  Products   sales
0   Kitkat   150.0
1   Bounty  5000.0
3     Mars  1000.0

The screenshot below illustrates the output after the code was implemented the Pycharm editor.

How to remove all non-numeric characters from all the values in a particular column in pandas dataframe in Python

Conclusion

Each of the methods mentioned above, like series.str.extract(), re.sub() with apply function, lambda function, and pd.to_numeric() function to remove all non-numeric characters in Pandas dataframe in Python.

Choose the methods according to your requirement of the code.

You may also like to read: