Drop Rows In Python Pandas DataFrames

Recently, while I was working on a data analysis project for a US retail company where I needed to clean thousands of customer records. The dataset had numerous missing values, duplicates, and outlier transactions that were skewing our analysis results. The issue was figuring out how to efficiently remove these problematic rows.

Pandas, Python’s efficient data manipulation library, offers several elegant solutions to drop rows from DataFrames. In this article, I’ll walk you through five practical methods I’ve used countless times in my decade of Python development experience.

Let’s get into these techniques with some real-world examples.

Table of Contents

Drop Rows in Python Pandas DataFrames

Let me show you some methods to drop rows in Python Pandas DataFrames.

Read Use Pandas to Convert Float to Int in Python

Method 1: Drop Rows Using Index Labels

The simplest way to drop specific rows is by using their index labels with the drop() function in Python.

import pandas as pd

# Sample sales data from a US retail store
data = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Headphones', 'Monitor'],
    'Price': [1200, 800, 350, 150, 250],
    'Stock': [45, 120, 80, 200, 30]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Drop rows at index 1 and 3
df_dropped = df.drop([1, 3])
print("\nDataFrame after dropping rows 1 and 3:")
print(df_dropped)

Output:

Original DataFrame:
      Product  Price  Stock
      Product  Price  Stock
      Product  Price  Stock
0      Laptop   1200     45
1  Smartphone    800    120
2      Tablet    350     80
3  Headphones    150    200
4     Monitor    250     30

DataFrame after dropping rows 1 and 3:
   Product  Price  Stock
0   Laptop   1200     45
2   Tablet    350     80
4  Monitor    250     30

You can refer to the screenshot below to see the output.

This technique works perfectly when you know exactly which rows need to be removed. I often use this approach when dealing with known problematic data points.

One important thing to note is that the drop() function doesn’t modify the original DataFrame by default. If you want to modify the original DataFrame, you can set inplace=True.

# Drop rows and modify the original DataFrame
df.drop([1, 3], inplace=True)
print("\nOriginal DataFrame after in-place drop:")
print(df)

Method 2: Filter Rows Using Boolean Conditions

When you need to drop rows based on certain conditions, boolean filtering in Python is your best option.

import pandas as pd

# Customer data with missing values
data = {
    'Customer': ['John Smith', 'Maria Garcia', 'Robert Johnson', 'Sarah Williams', 'David Brown'],
    'State': ['California', 'Texas', None, 'New York', 'Florida'],
    'Purchase': [250, 180, 340, None, 420]
}

df = pd.DataFrame(data)
print("Original DataFrame with missing values:")
print(df)

# Keep only rows where State is not None
df_clean = df[df['State'].notna()]
print("\nDataFrame after removing rows with missing State:")
print(df_clean)

# More complex condition: Remove rows with either missing State or Purchase below 200
condition = (df['State'].notna()) & (df['Purchase'] > 200)
df_filtered = df[condition]
print("\nDataFrame after applying complex filter:")
print(df_filtered)

Output:

Original DataFrame with missing values:
         Customer       State  Purchase
0      John Smith  California     250.0
1    Maria Garcia       Texas     180.0
2  Robert Johnson        None     340.0
3  Sarah Williams    New York       NaN
4     David Brown     Florida     420.0

DataFrame after removing rows with missing State:
         Customer       State  Purchase
0      John Smith  California     250.0
1    Maria Garcia       Texas     180.0
3  Sarah Williams    New York       NaN
4     David Brown     Florida     420.0

DataFrame after applying complex filter:
      Customer       State  Purchase
0   John Smith  California     250.0
4  David Brown     Florida     420.0

You can refer to the screenshot below to see the output.

This method gives you incredible flexibility for filtering your data. I use it daily when cleaning datasets, especially when dealing with business rules like “remove all transactions below $100” or “keep only customers from certain states.”

Read Print the First 10 Rows from a Pandas DataFrame in Python

Method 3: Drop Rows with Missing Values

Handling missing values is a common challenge in data analysis. Pandas provides the dropna() method specifically for this purpose in Python.

import pandas as pd
import numpy as np

# Survey responses from different US regions
data = {
    'Respondent': ['R001', 'R002', 'R003', 'R004', 'R005'],
    'Age': [32, 45, np.nan, 28, 53],
    'Income': [75000, np.nan, 120000, 65000, np.nan],
    'Satisfaction': [4, 5, np.nan, 3, 4]
}

df = pd.DataFrame(data)
print("Original DataFrame with NaN values:")
print(df)

# Drop rows with any missing values
df_clean = df.dropna()
print("\nDataFrame after dropping all rows with any NaN:")
print(df_clean)

# Drop rows where all elements are missing
df_partial = df.dropna(how='all')
print("\nDataFrame after dropping rows where all values are NaN:")
print(df_partial)

# Drop rows where values are missing in specific columns
df_subset = df.dropna(subset=['Age', 'Satisfaction'])
print("\nDataFrame after dropping rows with NaN in Age or Satisfaction:")
print(df_subset)

The dropna() method is incredibly versatile. I’ve found the subset parameter particularly useful when working with datasets where some columns are more critical than others.

Check out Filter DataFrame in Python Pandas

Method 4: Drop Duplicate Rows

Duplicate records can significantly impact your analysis results. Python Pandas makes it easy to identify and remove them.

import pandas as pd

# Transaction data with duplicates
data = {
    'Transaction_ID': ['T1001', 'T1002', 'T1001', 'T1003', 'T1004', 'T1004'],
    'Customer_ID': ['C201', 'C305', 'C201', 'C410', 'C305', 'C305'],
    'Amount': [120.50, 85.75, 120.50, 250.00, 75.20, 75.20]
}

df = pd.DataFrame(data)
print("Original DataFrame with duplicates:")
print(df)

# Drop completely duplicate rows
df_unique = df.drop_duplicates()
print("\nDataFrame after removing exact duplicates:")
print(df_unique)

# Drop rows that have the same Customer_ID and Amount
df_subset_unique = df.drop_duplicates(subset=['Customer_ID', 'Amount'])
print("\nDataFrame after removing rows with duplicate Customer_ID and Amount:")
print(df_subset_unique)

# Keep the last occurrence of duplicates instead of the first
df_keep_last = df.drop_duplicates(keep='last')
print("\nDataFrame keeping the last occurrence of duplicates:")
print(df_keep_last)

When processing transaction data, I’ve often needed to remove duplicate entries while keeping either the first or last occurrence. The keep parameter comes in handy for these scenarios.

Read Pandas Count Rows with Condition in Python

Method 5: Drop Rows Using Query Method

For complex filtering conditions, the Python query() method offers a more readable and intuitive syntax.

import pandas as pd

# Real estate data from different US cities
data = {
    'Property_ID': ['P101', 'P102', 'P103', 'P104', 'P105', 'P106'],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Miami', 'Seattle', 'Boston'],
    'Price': [850000, 750000, 450000, 550000, 650000, 750000],
    'Bedrooms': [2, 3, 4, 3, 2, 5],
    'Size_sqft': [1200, 1800, 2200, 1600, 1100, 2400]
}

df = pd.DataFrame(data)
print("Original Real Estate DataFrame:")
print(df)

# Drop expensive small properties
df_filtered = df.query('Price > 600000 and Size_sqft < 1500')
print("\nDataFrame after dropping expensive small properties:")
print(df_filtered)

# Multiple city filter
df_cities = df.query('City in ["New York", "Los Angeles", "Miami"]')
print("\nDataFrame filtered for specific cities:")
print(df_cities)

# Complex price range filter
df_price_range = df.query('(Price >= 500000 and Price <= 700000) or Bedrooms > 4')
print("\nDataFrame with specific price range or large homes:")
print(df_price_range)

I find the query() method is particularly helpful when working with multiple conditions that would otherwise make the code harder to read using traditional boolean filtering.

Bonus Tip: Measure Performance Impact

When working with larger datasets, it’s essential to understand the performance implications of your row-dropping operations.

import pandas as pd
import numpy as np
import time

# Create a larger dataset (100,000 rows)
np.random.seed(42)
large_df = pd.DataFrame({
    'ID': range(100000),
    'Value1': np.random.randn(100000),
    'Value2': np.random.randn(100000),
    'Category': np.random.choice(['A', 'B', 'C', 'D'], 100000)
})

# Add some missing values
large_df.loc[large_df.sample(frac=0.1).index, 'Value1'] = np.nan

# Measure time for different operations
start_time = time.time()
filtered_df = large_df[large_df['Value1'] > 0]
print(f"Boolean filtering time: {time.time() - start_time:.4f} seconds")

start_time = time.time()
query_df = large_df.query('Value1 > 0')
print(f"Query method time: {time.time() - start_time:.4f} seconds")

start_time = time.time()
na_dropped_df = large_df.dropna()
print(f"Drop NA time: {time.time() - start_time:.4f} seconds")

In my experience, boolean filtering is often the fastest approach for simple conditions, while query() can be more efficient for complex conditions, especially on larger datasets.

Check out Pandas Find Index of Value in Python

Best Practices When Dropping Rows

Based on my years of experience working with Pandas, here are some best practices I’ve found invaluable:

Always make a copy before dropping rows if you need to preserve the original data
Check your row count before and after operations to confirm that the expected number of rows was removed
Consider using inplace=True carefully – it modifies your original DataFrame
Chain operations when appropriate to improve readability
Document your filtering logic so others (including your future self) understand why certain rows were removed

I hope you found this tutorial helpful for managing and cleaning your DataFrames.The methods I explained in this tutorial are dropping rows using index labels, the query method, with missing values, dropping duplicate rows, and filtering rows using boolean conditions.

Drop Rows in Python Pandas DataFrames

Drop Rows in Python Pandas DataFrames

Method 1: Drop Rows Using Index Labels

Method 2: Filter Rows Using Boolean Conditions

Method 3: Drop Rows with Missing Values

Method 4: Drop Duplicate Rows

Method 5: Drop Rows Using Query Method

Bonus Tip: Measure Performance Impact

Best Practices When Dropping Rows

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends