Replace Multiple Values In Pandas DataFrame Based On Conditions

When working with data in Python, I often encounter situations where I need to replace values in a DataFrame based on specific conditions. Pandas makes this process easy, offering several methods to accomplish this task efficiently.

In this guide, I will show you multiple ways to replace values in a pandas DataFrame column based on conditions. These techniques are essential for data cleaning and transformation tasks that you’ll frequently face in real-world data analysis.

Let’s get in and explore these methods with practical examples!

This Tutorial Covers:

Conditional Value Replacement in Pandas

Before we get into the code, let’s understand what we’re trying to achieve. Often, we need to replace specific values in a column when they meet certain criteria.

For example, you might want to:

Replace all negative values with zero
Map state abbreviations to full state names
Convert categorical values to numeric codes
Fix inconsistent entries in your dataset

Pandas provides multiple approaches to handle these scenarios efficiently.

Read Get Index Values from DataFrames in Pandas Python

Method 1: Use loc[] for Conditional Replacement

The loc[] indexer is one of the simplest ways in Python pandas to replace values based on conditions.

import pandas as pd

# Sample DataFrame with US sales data
data = {
    'State': ['CA', 'NY', 'TX', 'FL', 'IL'],
    'Sales': [5000, -200, 3500, -150, 4200]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Replace negative sales values with 0
df.loc[df['Sales'] < 0, 'Sales'] = 0

print("\nDataFrame after replacing negative values:")
print(df)

Output:

Original DataFrame:
  State  Sales
0    CA   5000
1    NY   -200
2    TX   3500
3    FL   -150
4    IL   4200

DataFrame after replacing negative values:
  State  Sales
0    CA   5000
1    NY      0
2    TX   3500
3    FL      0
4    IL   4200

I executed the above example code and added the screenshot below.

pandas replace values in column based on multiple condition

In this example, I used loc[] to identify rows where sales values are negative and replace them with zero. This method is intuitive and readable, making it perfect for simple conditional replacements.

Method 2: Use replace() with a Dictionary

Python replace() method allows you to replace multiple values at once using a dictionary mapping.

import pandas as pd

# DataFrame with US state abbreviations
data = {
    'State': ['CA', 'NY', 'TX', 'FL', 'IL', 'AZ'],
    'Revenue': [10000, 8500, 9200, 7500, 8000, 6500]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Create a mapping dictionary for state abbreviations to full names
state_mapping = {
    'CA': 'California',
    'NY': 'New York',
    'TX': 'Texas',
    'FL': 'Florida',
    'IL': 'Illinois',
    'AZ': 'Arizona'
}

# Replace state abbreviations with full names
df['State'] = df['State'].replace(state_mapping)

print("\nDataFrame after replacing state abbreviations:")
print(df)

Output:

Original DataFrame:
  State  Revenue
0    CA    10000
1    NY     8500
2    TX     9200
3    FL     7500
4    IL     8000
5    AZ     6500

DataFrame after replacing state abbreviations:
        State  Revenue
0  California    10000
1    New York     8500
2       Texas     9200
3     Florida     7500
4    Illinois     8000
5     Arizona     6500

I executed the above example code and added the screenshot below.

pandas replace values in column based on condition

This approach is excellent when you need to replace multiple distinct values with their corresponding replacements. The dictionary mapping makes the code clean and maintainable.

Check out Convert a DataFrame to JSON in Python

Method 3: Use apply() with a Custom Function

For more complex conditional logic, the Python apply() method with a custom function gives you flexibility.

import pandas as pd

# Sample DataFrame with US product ratings
data = {
    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'Rating': [4.2, 3.5, 4.7, 2.8, 3.9]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Define a function to categorize ratings
def categorize_rating(rating):
    if rating >= 4.5:
        return 'Excellent'
    elif rating >= 4.0:
        return 'Good'
    elif rating >= 3.0:
        return 'Average'
    else:
        return 'Poor'

# Apply the function to create a new column
df['Rating_Category'] = df['Rating'].apply(categorize_rating)

print("\nDataFrame after adding rating categories:")
print(df)

Output:

Original DataFrame:
    Product  Rating
0    Laptop     4.2
1     Phone     3.5
2    Tablet     4.7
3   Monitor     2.8
4  Keyboard     3.9

DataFrame after adding rating categories:
    Product  Rating Rating_Category
0    Laptop     4.2            Good
1     Phone     3.5         Average
2    Tablet     4.7       Excellent
3   Monitor     2.8            Poor
4  Keyboard     3.9         Average

I executed the above example code and added the screenshot below.

The apply() method is useful when your replacement logic involves multiple conditions or calculations. It allows you to encapsulate complex logic in a function.

Method 4: Use numpy.where() for Simple Conditions

For binary conditions, numpy.where() provides a concise solution in Python pandas.

import pandas as pd
import numpy as np

# Sales data with different regions
data = {
    'Region': ['East', 'West', 'North', 'South', 'Central'],
    'Sales': [25000, 30000, 15000, 28000, 22000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Use numpy.where() to categorize sales performance
df['Performance'] = np.where(df['Sales'] > 25000, 'High', 'Regular')

print("\nDataFrame after adding performance category:")
print(df)

The numpy.where() function works like an if-else statement and is particularly efficient for large datasets.

Method 5: Use mask() for Value Replacement

Pyhton mask() method provides another approach for conditional replacement.

import pandas as pd

# Customer data with purchase amounts
data = {
    'Customer': ['John', 'Alice', 'Bob', 'Carol', 'Dave'],
    'Purchase': [120, 350, 80, 420, 190]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Replace values using mask()
df['Discount'] = df['Purchase'].mask(df['Purchase'] > 300, 'Eligible')
df['Discount'] = df['Discount'].mask(df['Purchase'] <= 300, 'Not Eligible')

print("\nDataFrame after adding discount eligibility:")
print(df)

The mask() method replaces values where the condition is True with the specified value, leaving others unchanged.

Read Convert a DataFrame to JSON Array in Python

Method 6: Use np.select() for Multiple Conditions

When dealing with multiple conditions, numpy.select() offers a cleaner alternative to nested np.where() calls.

import pandas as pd
import numpy as np

# Temperature data for US cities
data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Temp_F': [45, 75, 32, 80, 95]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Define conditions and choices
conditions = [
    (df['Temp_F'] < 32),
    (df['Temp_F'] >= 32) & (df['Temp_F'] < 50),
    (df['Temp_F'] >= 50) & (df['Temp_F'] < 70),
    (df['Temp_F'] >= 70) & (df['Temp_F'] < 90),
    (df['Temp_F'] >= 90)
]

choices = ['Freezing', 'Cold', 'Moderate', 'Warm', 'Hot']

# Create a new column using np.select
df['Temp_Category'] = np.select(conditions, choices, default='Unknown')

print("\nDataFrame after adding temperature categories:")
print(df)

This approach is perfect when you have multiple non-overlapping conditions, and each condition maps to a specific value.

Performance Considerations: Replace Multiple Values in Pandas

When working with larger datasets, the method we choose can significantly impact performance. Here are some general guidelines:

For simple conditions, loc[] and numpy.where() are usually the most efficient
For multiple conditions, numpy.select() performs better than nested conditionals
The apply() method can be slower for large datasets as it applies Python-level operations

When performance is critical, I recommend testing different approaches with your specific dataset.

Check out pd.crosstab Function in Python

Real-World Application: Clean US Sales Data

Let’s put these techniques together in a practical example of cleaning sales data:

import pandas as pd
import numpy as np

# Create a messy sales dataset
data = {
    'State': ['CA', 'ny', 'Texas', 'FL', 'il', 'AZ', 'N/A', 'CA'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone', 'Tablet', 'Monitor', 'Phone'],
    'Sales': [5000, -200, 3500, -150, 4200, 2800, 0, 3900],
    'Status': ['Completed', 'pending', 'COMPLETED', 'canceled', 'Pending', 'completed', 'PENDING', 'COMPLETED']
}

df = pd.DataFrame(data)
print("Original messy DataFrame:")
print(df)

# 1. Standardize state names
state_mapping = {
    'CA': 'California',
    'ny': 'New York',
    'Texas': 'Texas',  # Already correct
    'FL': 'Florida',
    'il': 'Illinois',
    'AZ': 'Arizona',
    'N/A': np.nan  # Replace N/A with NaN
}

df['State'] = df['State'].replace(state_mapping)

# 2. Replace negative sales with 0
df.loc[df['Sales'] < 0, 'Sales'] = 0

# 3. Standardize status values (convert all to title case)
df['Status'] = df['Status'].str.title()

# 4. Add a sales category
conditions = [
    (df['Sales'] == 0),
    (df['Sales'] > 0) & (df['Sales'] < 1000),
    (df['Sales'] >= 1000) & (df['Sales'] < 3000),
    (df['Sales'] >= 3000) & (df['Sales'] < 5000),
    (df['Sales'] >= 5000)
]

choices = ['No Sale', 'Low', 'Medium', 'High', 'Excellent']
df['Sales_Category'] = np.select(conditions, choices, default='Unknown')

print("\nCleaned DataFrame:")
print(df)

This comprehensive example demonstrates how to use multiple methods to clean and transform a messy dataset into a consistent, analysis-ready format.

Conclusion

Replacing values based on conditions is a fundamental data cleaning task that you’ll encounter regularly when working with pandas. With the methods I’ve outlined, you now have a toolkit for handling a wide range of conditional replacement scenarios.

Each approach has its strengths:

Use loc[] for straightforward conditions
Use replace() with dictionaries for mapping values
Use apply() for complex logic
Use numpy.where() for binary conditions
Use mask() for targeted replacements
Use numpy.select() for multiple conditions

By choosing the right method for your specific needs, you can make your data cleaning code more efficient and readable.

Replace Multiple Values in Pandas DataFrame Based on Conditions

Conditional Value Replacement in Pandas

Method 1: Use loc[] for Conditional Replacement

Method 2: Use replace() with a Dictionary

Method 3: Use apply() with a Custom Function

Method 4: Use numpy.where() for Simple Conditions

Method 5: Use mask() for Value Replacement

Method 6: Use np.select() for Multiple Conditions

Performance Considerations: Replace Multiple Values in Pandas

Real-World Application: Clean US Sales Data

Conclusion

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends