Pandas DataFrame Drop() Function

While working with data in Python, the pandas library is an indispensable tool that I’ve relied on for years. One of the most common operations in data cleaning and preparation is removing unwanted rows or columns from your dataset.

This is where the DataFrame drop() function comes into play. Whether you’re dealing with missing values, redundant features, or simply need to reshape your data.

In this article, I’ll walk you through everything you need to know about the pandas DataFrame drop() function.

Table of Contents

DataFrame drop() Function

The drop() function in Python is a useful method in pandas that allows you to remove rows or columns from a DataFrame. It’s like having a precise scalpel that lets you surgically remove the exact parts of your data that you don’t need.

The beauty of drop() is its flexibility; you can remove single or multiple rows/columns, use labels or indices, and even specify how the operation should be performed.

Basic Syntax of the drop() Function

Before getting into specific examples, let’s understand the basic syntax:

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

The key parameters include:

labels: Index labels to drop
axis: 0 for rows, 1 for columns
index/columns: Alternative to specifying labels and axis
inplace: Whether to modify the DataFrame directly or return a copy
errors: ‘raise’ to throw an error if labels don’t exist, ‘ignore’ to ignore them

Method 1 – Drop Columns from a DataFrame

One of the most common uses of Python drop() is removing unnecessary columns from your dataset. Let me show you how this works with a practical example:

import pandas as pd

# Sample sales data
sales_data = pd.DataFrame({
    'Date': ['2023-01-15', '2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19'],
    'Store': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Units_Sold': [12, 25, 18, 30, 15],
    'Revenue': [14400, 20000, 9000, 36000, 12000],
    'Notes': [None, 'Promotion', None, 'Holiday Sale', None]
})

# Drop single column
df_no_notes = sales_data.drop('Notes', axis=1)
print("After dropping 'Notes':", df_no_notes.columns.tolist())

# Drop multiple columns
df_simplified = sales_data.drop(['Notes', 'Store'], axis=1)
print("After dropping 'Notes' and 'Store':", df_simplified.columns.tolist())

# Drop using 'columns' parameter
df_sales_only = sales_data.drop(columns=['Notes', 'Store', 'Product'])
print("After keeping only sales data:", df_sales_only.columns.tolist())

Output:

After dropping 'Notes': ['Date', 'Store', 'Product', 'Units_Sold', 'Revenue']
After dropping 'Notes' and 'Store': ['Date', 'Product', 'Units_Sold', 'Revenue']
After keeping only sales data: ['Date', 'Units_Sold', 'Revenue']

I executed the above example code and added the screenshot below.

When working with real datasets, I often need to drop columns that contain redundant information or aren’t relevant to my analysis. Using the axis=1 parameter (or the columns parameter) makes it clear that we’re operating on columns rather than rows.

Read Convert a DataFrame to JSON in Python

Method 2 – Drop Rows from a DataFrame

Removing rows is just as common as removing columns. Here’s how to do it:

# Drop a single row by index
df_without_row = sales_data.drop(0, axis=0)
print("After dropping row at index 0:", df_without_row.index.tolist())

# Drop multiple rows by index
df_without_multiple = sales_data.drop([1, 3], axis=0)
print("After dropping rows at index 1 and 3:", df_without_multiple.index.tolist())

# Drop a row using custom index (by date)
sales_data_indexed = sales_data.set_index('Date')
df_by_date = sales_data_indexed.drop('2023-01-17')
print("After dropping date '2023-01-17':", df_by_date.index.tolist())

Output:

After dropping row at index 0: [1, 2, 3, 4]
After dropping rows at index 1 and 3: [0, 2, 4]
After dropping date '2023-01-17': ['2023-01-15', '2023-01-16', '2023-01-18', '2023-01-19']

I executed the above example code and added the screenshot below.

In my experience, dropping rows is particularly useful when dealing with outliers or specific records that could skew your analysis. The default axis value is 0, so you can omit it when dropping rows, but I recommend including it for code clarity.

Method 3 – Use inplace Parameter for Direct Modification

Sometimes you want to modify your original DataFrame directly instead of creating a new one:

import pandas as pd

# Sample sales data
sales_data = pd.DataFrame({
    'Date': ['2023-01-15', '2023-01-16', '2023-01-17'],
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Revenue': [14400, 20000, 9000]
})

# Without inplace - original DataFrame remains unchanged
df_copy = sales_data.drop(0)  # drops row at index 0 from a copy
print("Original DataFrame (unchanged):")
print(sales_data)

# With inplace=True - original DataFrame is modified
sales_data.drop(0, inplace=True)  # modifies sales_data by dropping row at index 0
print("\nModified DataFrame (row 0 dropped):")
print(sales_data)

Output:

Original DataFrame (unchanged):
         Date Product  Revenue
0  2023-01-15  Laptop    14400
1  2023-01-16   Phone    20000
2  2023-01-17  Tablet     9000

Modified DataFrame (row 0 dropped):
         Date Product  Revenue
1  2023-01-16   Phone    20000
2  2023-01-17  Tablet     9000

I executed the above example code and added the screenshot below.

I’ve found that using inplace=True can be memory-efficient when working with large datasets, but be careful, this operation can’t be undone. I typically make a backup of my DataFrame before using inplace operations.

Check out Convert a DataFrame to JSON Array in Python

Method 4 – Conditional Dropping with Boolean Indexing

While not directly using the drop() function, you can achieve similar results with boolean indexing:

# Creating a new DataFrame with US temperature data
weather_data = pd.DataFrame({
    'City': ['New York', 'Los Angeles', 'Chicago', 'Miami', 'Denver'],
    'Temperature': [45, 75, 32, 85, 50],
    'Humidity': [65, 45, 70, 80, 30],
    'Precipitation': [0.1, 0.0, 0.2, 0.5, 0.0]
})

# Keep only rows where Temperature is above 40
warm_cities = weather_data[weather_data['Temperature'] > 40]

# This is equivalent to dropping rows where Temperature <= 40
# warm_cities = weather_data.drop(weather_data[weather_data['Temperature'] <= 40].index)

This approach is incredibly useful when you need to filter data based on specific conditions. I use this method frequently when cleaning datasets to remove records that don’t meet certain criteria.

Method 5 – Drop Rows with Missing Values

Python’s drop() function works well with pandas’ built-in methods for handling missing values:

# Sample DataFrame with missing values in US stock data
stock_data = pd.DataFrame({
    'Date': ['2023-01-15', '2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19'],
    'Symbol': ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA'],
    'Open': [170.33, 142.65, 239.05, 96.43, None],
    'Close': [171.22, None, 240.22, 97.25, 122.40],
    'Volume': [70521698, 31425612, None, 64633575, 177207916]
})

# Drop rows with any missing values
clean_data = stock_data.dropna()

# This is equivalent to using drop() on rows with NaN values
# clean_data = stock_data.drop(stock_data[stock_data.isna().any(axis=1)].index)

When preparing data for machine learning models or statistical analysis, I often need to decide how to handle missing values. The dropna() method is a convenient shortcut, but understanding how it relates to drop() helps you gain more control over your data cleaning process.

Read pd.crosstab Function in Python

Method 6 – Drop Duplicates

Similar to handling missing values, pandas provides a method to drop duplicate rows:

# Sample DataFrame with duplicate entries in US customer data
customer_data = pd.DataFrame({
    'ID': [101, 102, 103, 102, 104],
    'Name': ['John Smith', 'Mary Johnson', 'Robert Brown', 'Mary Johnson', 'Susan Davis'],
    'State': ['NY', 'CA', 'TX', 'CA', 'FL']
})

# Drop duplicate rows (keeps first occurrence)
unique_customers = customer_data.drop_duplicates()

# Drop duplicates based on specific columns
unique_by_state = customer_data.drop_duplicates(subset=['State'])

This is another specialized use case related to the drop() function. When analyzing customer data or transaction records, removing duplicates is often a critical preprocessing step.

Check out np.where in Pandas Python

Best Practices for Using drop()

Over my years of working with pandas, I’ve developed some best practices for using the drop() function effectively:

Always verify your data before and after dropping: Use .head(), .shape, or .info() to confirm you’ve removed the intended elements.
Be cautious with inplace=True: It modifies your original data and can’t be undone. Consider creating a copy first.
Use explicit axis naming: Even though axis=0 is the default for rows, explicitly stating the axis makes your code more readable.
Consider using the errors=’ignore’ parameter: This prevents your code from crashing if you try to drop labels that don’t exist.
Chain methods efficiently: Instead of dropping elements in multiple steps, you can often chain operations for cleaner code.

The drop() function in pandas is an essential tool for data manipulation. Whether you’re cleaning messy datasets, focusing your analysis on specific variables, or preparing data for visualization, mastering drop() will significantly improve your data workflow.

While there are specialized methods like dropna() and drop_duplicates() for specific scenarios, understanding the core drop() function gives you more flexibility and control over your data processing pipeline.

I hope this guide helps you use the pandas drop() function more effectively in your Python projects.

Pandas related tutorial:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

Pandas DataFrame drop() Function

DataFrame drop() Function

Basic Syntax of the drop() Function

Method 1 – Drop Columns from a DataFrame

Method 2 – Drop Rows from a DataFrame

Method 3 – Use inplace Parameter for Direct Modification

Method 4 – Conditional Dropping with Boolean Indexing

Method 5 – Drop Rows with Missing Values

Method 6 – Drop Duplicates

Best Practices for Using drop()

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends