How to drop non-numeric columns from Pandas DataFrame [3 ways]

This Pandas tutorial will cover all the possible methods to drop non-numeric columns from Pandas DataFrames in Python.

We mostly focus on how to drop non-numeric columns in Python dataframe, but here, we got to learn why and when, too. We will get answers to all these questions and have a strong basement in Python Pandas.

Here is the list of all the methods that can be used to drop all non numeric columns in Pandas:

  • The DataFrame._get_numeric_data() method
  • The select_dtypes([‘number’]) method
  • The pd.to_numeric() method

Moreover, we will also cover the following topics:

  • Why drop non numeric columns in Pandas
  • When to drop all non numeric columns pandas

At the end of this Python tutorial, we will understand why to drop non-numeric columns, when to drop non-numeric columns, and what non-numeric columns should be dropped from the dataset for better analysis.

Import the Dataset

We can directly load the dataset from the seaborn.

#Import the necessary libraries
import numpy as np 
import pandas as pd
import seaborn as sns


#Load the dataset after downloading manually from kaggle
data=pd.read_csv("tips.csv")
data.head()
(or)
#Load the dataset using seaborn library without downloading
data=sns.load_dataset("tips")
data.head()

From the below output image, we can observe that our tips dataset has columns like total_bill, tip, sex, smoke status, day, time, and size. It gives complete information about a customer required for further analysis.

drop all the non numeric columns from the datasets in python
Import or Load the tips dataset from Seaborn in Python

This way, we can load or import the dataset in Python. Here, we have imported the tips dataset.

How to drop non-numeric columns from Pandas DataFrame

There are many methods to drop the non-numeric columns from the Pandas DataFrames in Python. We can use the following functions which are already existing in the Python library:

  • DataFrame._get_numeric_data()
  • select_dtypes([‘number’])
  • pd.to_numeric()
READ:  How NumPy create nan array in Python [6 Methods]

how to drop non numeric columns in pandas using the “DataFrame._get_numeric_data()” method

The “DataFrame._get_numeric_data()” method in Python stores only numeric columns and eliminates the non-numeric columns from Pandas DataFrame or complex datasets.

  • Here, in the code below, we can observe that the inbuilt function “_get_numeric_data()” will return the numeric columns from the “data” dataset.
  • Instead of dropping the non-numeric columns from the original dataset. We have initialized a new variable, data_numeric, to store this numeric data of Pandas DataFrame or dataset.
# Dropping all non numeric columns and storing only numeric columns of a dataset 
data_numeric = data._get_numeric_data()
data_numeric

The output image below shows that all the non-numeric columns are dropped from the loaded dataset, and the left numeric columns are stored in the ‘data_numeric‘ variable in Python.

drop all non numeric columns pandas
drop a non numeric columns from a dataset

This way, we can drop non-numeric columns from Pandas DataFrame in Python.

Pandas drop non numeric columns using “select_dtypes([‘number’])” method

The method “select_dtypes([‘number’])” in Python stores only numeric columns and eliminates the non-numeric columns from Pandas DataFrame or complex datasets.

  • Here in the below code, we can observe that the inbuilt function “select_dtypes([‘number’])” will store the numeric columns from the “data” dataset as we had passed the ‘number’ datatype into the function select_dtypes() in Python Pandas.
  • Instead of dropping the non-numeric columns from the original dataset. We have initialized a new variable, data_numeric, to store this numeric data of Pandas DataFrame or dataset.
# Dropping all non numeric columns and storing only numeric columns of a dataset 
data_numeric=data.select_dtypes(['number'])
data_numeric

From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset, and the rest of the numeric columns are stored in the ‘data_numeric’ variable in Python.

drop non-numeric columns from Pandas DataFrame
Drop non-numeric columns from DataFrame or dataset

This way, we can drop non-numeric columns from DataFrame or dataset in Python using the select_dtypes([‘number’]) method.

READ:  AttributeError: 'numpy.ndarray' object has no attribute 'split' in Python [4 ways to handle]

Drop non-numeric columns from pandas DataFrame using the method “pd.to_numeric()” in Python

The Python pd.to_numeric() method will convert every value in the dataset to a numeric datatype. If it fails to convert to a numeric datatype, it will return NaN in Python.

  • In the code below, we can observe that errors=’coerce’ is passed to the pd.to_numeric() method, which means the pd.to_numeric() method in Python will try to convert every cell to the numeric datatype. If it fails to convert, it will replace the cell with NaN since the coerced value is passed to the errors parameter in pd.to_numeric() in Python Pandas.
  • Then, the Pandas dropna() method is called in to drop the null values from the dataset. i.e., it will drop all the non-numeric columns from the dataset since axis=1 is passed to the function.
# Dropping all non numeric columns and storing only numeric columns of a dataset 
data_numeric=data.applymap(lambda x: pd.to_numeric(x, errors='coerce')).dropna(axis=1)
data_numeric

From the below output image, we can observe that all the non-numeric columns are dropped from the loaded dataset, and the rest of the numeric columns are stored in the data_numeric variable in Python.

drop non-numeric variables from the dataframe to have a clean dataframe with numeric variables.
drop non numeric columns from a DataFrame

This way, we can drop non-numeric columns from Pandas DataFrame using the method “pd.to_numeric()” in Python.

Why Pandas drop categorical columns in Python

Till now, we have learned how to drop non-numeric columns. Now let us know to concentrate on when to drop non-numeric columns from Pandas DataFrame in Python:

  • Humans can understand categorical data; we only understand numbers in machines like our computers. So, everything we pass as input to it is first converted to numbers, and then the machine understands.
  • We usually drop the non-numeric columns in Python Pandas to reduce confusion or complexity.
READ:  Python Turtle 3d Shapes

When to drop Pandas non numeric columns in Python

Now let us know to concentrate on when to drop non-numeric columns from Pandas DataFrame in Python:

  • Dropping all the non-numeric columns in a Pandas dataset is not always a better choice.
  • We have to drop the non-numeric columns only if they are unimportant to the dataset.
  • If there is an essential non-numeric column in our dataset, then instead of dropping it, we will convert it to numeric values using techniques like label encoding, one hot encoding, etc.

Conclusion

Through this Python pandas tutorial, we saw different methods to drop the non-numeric columns from the Pandas dataframe by using pd.to_numeric(), select_dtypes([‘number’]), _get_numeric_data() functions in Python with examples.

We have also seen why to drop non-numeric columns in Pandas and when to drop all non-numeric columns in Pandas.

Also, we can follow the below Pandas Python tutorials: