This Pandas tutorial will cover all the possible methods to drop non-numeric columns from Pandas DataFrames in Python.
We mostly focus on how to drop non-numeric columns in Python dataframe, but here, we got to learn why and when, too. We will get answers to all these questions and have a strong basement in Python Pandas.
Here is the list of all the methods that can be used to drop all non numeric columns in Pandas:
- The DataFrame._get_numeric_data() method
- The select_dtypes([‘number’]) method
- The pd.to_numeric() method
Moreover, we will also cover the following topics:
- Why drop non numeric columns in Pandas
- When to drop all non numeric columns pandas
At the end of this Python tutorial, we will understand why to drop non-numeric columns, when to drop non-numeric columns, and what non-numeric columns should be dropped from the dataset for better analysis.
Import the Dataset
We can directly load the dataset from the seaborn.
#Import the necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
#Load the dataset after downloading manually from kaggle
data=pd.read_csv("tips.csv")
data.head()
(or)
#Load the dataset using seaborn library without downloading
data=sns.load_dataset("tips")
data.head()
From the below output image, we can observe that our tips dataset has columns like total_bill, tip, sex, smoke status, day, time, and size. It gives complete information about a customer required for further analysis.
This way, we can load or import the dataset in Python. Here, we have imported the tips dataset.
How to drop non-numeric columns from Pandas DataFrame
There are many methods to drop the non-numeric columns from the Pandas DataFrames in Python. We can use the following functions which are already existing in the Python library:
- DataFrame._get_numeric_data()
- select_dtypes([‘number’])
- pd.to_numeric()
how to drop non numeric columns in pandas using the “DataFrame._get_numeric_data()” method
The “DataFrame._get_numeric_data()” method in Python stores only numeric columns and eliminates the non-numeric columns from Pandas DataFrame or complex datasets.
- Here, in the code below, we can observe that the inbuilt function “_get_numeric_data()” will return the numeric columns from the “data” dataset.
- Instead of dropping the non-numeric columns from the original dataset. We have initialized a new variable, data_numeric, to store this numeric data of Pandas DataFrame or dataset.
# Dropping all non numeric columns and storing only numeric columns of a dataset
data_numeric = data._get_numeric_data()
data_numeric
The output image below shows that all the non-numeric columns are dropped from the loaded dataset, and the left numeric columns are stored in the ‘data_numeric‘ variable in Python.
This way, we can drop non-numeric columns from Pandas DataFrame in Python.
Pandas drop non numeric columns using “select_dtypes([‘number’])” method
The method “select_dtypes([‘number’])” in Python stores only numeric columns and eliminates the non-numeric columns from Pandas DataFrame or complex datasets.
- Here in the below code, we can observe that the inbuilt function “select_dtypes([‘number’])” will store the numeric columns from the “data” dataset as we had passed the ‘number’ datatype into the function select_dtypes() in Python Pandas.
- Instead of dropping the non-numeric columns from the original dataset. We have initialized a new variable, data_numeric, to store this numeric data of Pandas DataFrame or dataset.
# Dropping all non numeric columns and storing only numeric columns of a dataset
data_numeric=data.select_dtypes(['number'])
data_numeric
From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset, and the rest of the numeric columns are stored in the ‘data_numeric’ variable in Python.
This way, we can drop non-numeric columns from DataFrame or dataset in Python using the select_dtypes([‘number’]) method.
Drop non-numeric columns from pandas DataFrame using the method “pd.to_numeric()” in Python
The Python pd.to_numeric() method will convert every value in the dataset to a numeric datatype. If it fails to convert to a numeric datatype, it will return NaN in Python.
- In the code below, we can observe that errors=’coerce’ is passed to the pd.to_numeric() method, which means the pd.to_numeric() method in Python will try to convert every cell to the numeric datatype. If it fails to convert, it will replace the cell with NaN since the coerced value is passed to the errors parameter in pd.to_numeric() in Python Pandas.
- Then, the Pandas dropna() method is called in to drop the null values from the dataset. i.e., it will drop all the non-numeric columns from the dataset since axis=1 is passed to the function.
# Dropping all non numeric columns and storing only numeric columns of a dataset
data_numeric=data.applymap(lambda x: pd.to_numeric(x, errors='coerce')).dropna(axis=1)
data_numeric
From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset, and the rest of the numeric columns are stored in the data_numeric variable in Python.
This way, we can drop non-numeric columns from Pandas DataFrame using the method “pd.to_numeric()” in Python.
Why Pandas drop categorical columns in Python
Till now, we have learned how to drop non-numeric columns. Now let us know to concentrate on when to drop non-numeric columns from Pandas DataFrame in Python:
- Humans can understand categorical data; we only understand numbers in machines like our computers. So, everything we pass as input to it is first converted to numbers, and then the machine understands.
- We usually drop the non-numeric columns in Python Pandas to reduce confusion or complexity.
When to drop Pandas non numeric columns in Python
Now let us know to concentrate on when to drop non-numeric columns from Pandas DataFrame in Python:
- Dropping all the non-numeric columns in a Pandas dataset is not always a better choice.
- We have to drop the non-numeric columns only if they are unimportant to the dataset.
- If there is an essential non-numeric column in our dataset, then instead of dropping it, we will convert it to numeric values using techniques like label encoding, one hot encoding, etc.
Conclusion
Through this Python pandas tutorial, we saw different methods to drop the non-numeric columns from the Pandas dataframe by using pd.to_numeric(), select_dtypes([‘number’]), _get_numeric_data() functions in Python with examples.
We have also seen why to drop non-numeric columns in Pandas and when to drop all non-numeric columns in Pandas.
Also, we can follow the below Pandas Python tutorials:
- How to Add Column from Another Dataframe in Pandas Python
- Pandas drop_duplicates() function in Python
- How to delete a column in Pandas Python
- How to Remove All Non-numeric Characters in Pandas
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.