In this Python tutorial, we will cover all the possible methods to drop non-numeric columns from a pandas DataFrames or dataset in Python.
We mostly focus on how to drop non-numeric columns, but here we got to learn why and when too. We will get answers to all these questions and have a strong basement in Python Pandas. Moreover, we will also cover the following topics:
- How to drop non-numeric columns
- Drop non-numeric columns from pandas DataFrame using the method “DataFrame._get_numeric_data()” in Python
- Drop non-numeric columns from pandas DataFrame using the method “select_dtypes([‘number’])” in Python
- Drop non-numeric columns from pandas DataFrame using the method “pd.to_numeric()” in Python
- why drop non-numeric columns of a DataFrame or dataset in Python
- when to drop non-numeric columns of a DataFrame or dataset in Python
Most of the time we ended up dropping the important features from the dataset just because it’s a non-numeric column, this will lead to a decrease in accuracy while building a model. All this confusion will be cleared through this Python tutorial.
At the end of this Python tutorial, we will get a clear understanding of things like why to drop non-numeric columns, when to drop non-numeric columns and what kind of non-numeric columns should be dropped from the dataset for better analysis.
Import the Dataset
We can download the dataset from the below Kaggle link
“https://www.kaggle.com/datasets/ranjeetjain3/seaborn-tips-dataset” or else we can directly load the same dataset from the seaborn.
#Import the necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
#Load the dataset after downloading manually from kaggle
data=pd.read_csv("tips.csv")
data.head()
(or)
#Load the dataset using seaborn library without downloading
data=sns.load_dataset("tips")
data.head()
From the below output image, we can observe that our tips dataset has columns like total_bill, tip, sex, smoke status, day, time, and size. It gives complete information about a customer required for further analysis.
This way we can load or import the dataset in Python. Here we have imported the tips dataset.
How to drop non-numeric columns from Pandas DataFrame
There are many methods to drop the non-numeric columns from the Pandas DataFrames in Python. We can use the following functions which are already existing in the Python library:
- DataFrame._get_numeric_data()
- select_dtypes([‘number’])
- pd.to_numeric()
Drop non-numeric columns from pandas DataFrame using “DataFrame._get_numeric_data()” method
The method “DataFrame._get_numeric_data()” in Python stores only numeric columns and eliminates the non-numeric columns from Pandas DataFrame or complex datasets.
- Here in the below code, we can observe that the inbuilt function “_get_numeric_data()” will return the numeric columns from the “data” dataset.
- And Instead of dropping the non-numeric columns from the original dataset. We have initialized a new variable “data_numeric” to store this numeric data of Pandas DataFrame or dataset.
# Dropping all non numeric columns and storing only numeric columns of a dataset
data_numeric = data._get_numeric_data()
data_numeric
From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset and the left numeric columns are stored in the ‘data_numeric’ variable in Python.
This way we can drop non-numeric columns from Pandas DataFrame in Python.
Drop non-numeric columns from Pandas DataFrame using “select_dtypes([‘number’])” method
The method “select_dtypes([‘number’])” in Python stores only numeric columns and eliminates the non-numeric columns from Pandas DataFrame or complex datasets.
- Here in the below code, we can observe that the inbuilt function “select_dtypes([‘number’])” will store the numeric columns from the “data” dataset as we had passed the ‘number’ datatype into the function select_dtypes() in Python Pandas.
- And Instead of dropping the non-numeric columns from the original dataset. We have initialized a new variable “data_numeric” to store this numeric data of Pandas DataFrame or dataset.
# Dropping all non numeric columns and storing only numeric columns of a dataset
data_numeric=data.select_dtypes(['number'])
data_numeric
From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset and the rest numeric columns are stored in the ‘data_numeric’ variable in Python.
This way we can drop non-numeric columns from DataFrame or dataset in Python using “select_dtypes([‘number’])” method.
Drop non-numeric columns from pandas DataFrame using the method “pd.to_numeric()” in Python
The method “pd.to_numeric()” will convert every value in the dataset to a numeric datatype. Incase, if it fails to convert to a numeric datatype it will return NaN in Python.
- Here in the below code, we can observe that errors=’coerce’ is passed to pd.to_numeric() method, which means pd.to_numeric() will try to convert every cell to the numeric datatype, Incase it fails to convert then it will replace the cell with NaN since coerce value is passed to the errors parameter in pd.to_numeric() in Python Pandas.
- And then Pandas dropna() method is called in to drop the null values from the dataset. i.e here it will drop all the non-numeric columns from the dataset since axis=’1′ is passed to the function.
# Dropping all non numeric columns and storing only numeric columns of a dataset
data_numeric=data.applymap(lambda x: pd.to_numeric(x, errors='coerce')).dropna(axis=1)
data_numeric
From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset and the rest numeric columns are stored in the ‘data_numeric’ variable in Python.
This way we can drop non-numeric columns from Pandas DataFrame using the method “pd.to_numeric()” in Python.
Why drop non-numeric columns of a DataFrame or dataset in Python
Till now we learned how to drop non-numeric columns, now let us know concentrate on when to drop non-numeric columns from Pandas DataFrame in Python:
- We, humans, can understand categorical data, when it comes to machines like our computers it only understands numbers. So, everything we passed as input to it is first converted to numbers, and then the machine understands.
- To reduce the confusion or complexity we usually drop the non-numeric columns in Python Pandas.
When to drop non-numeric columns of a DataFrame or dataset in Python
Now let us know concentrate on when to drop non-numeric columns from Pandas DataFrame in Python:
- Dropping all the non-numeric columns in a Pandas dataset is not a better choice all the time.
- We have to drop the non-numeric columns only if it is an unimportant ones to the dataset.
- If there is an important non-numeric column in our dataset then instead of dropping it we will convert it to numeric values using techniques like label encoding, one hot encoding, etc…
Conclusion
Through this Python pandas tutorial, we have covered topics like :
- When to drop non-numeric columns from Pandas DataFrame
- Why drop non-numeric columns from Pandas DataFrame
And we also saw different methods to drop the non-numeric columns from the Pandas dataframe like by using pd.to_numeric(), select_dtypes([‘number’]), _get_numeric_data() functions in Python.
Also, we can follow the below Pandas Python tutorials
- How to get index of rows in Pandas DataFrame
- How to drop rows with NaN or missing values in Pandas DataFrame
- Pandas add a new column to an existing DataFrame
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.