Update Column Values In Python Pandas DataFrame

As a developer working with data in Python, I often need to modify values in a DataFrame column. Whether it’s correcting errors, applying transformations, or updating based on conditions, knowing how to update column values efficiently is an essential skill for any data professional.

In this tutorial, I’ll walk you through various methods to update column values in a Pandas DataFrame. I’ve used these techniques countless times in real-world data analysis projects.

Let’s get in and explore these useful data manipulation techniques!

This Tutorial Covers:

Method 1: Update Column Values Using loc[]

Python’s loc[] accessor is one of the easiest ways to update values in a DataFrame column.

Here’s a simple example using sales data from various US states:

import pandas as pd

# Create a sample DataFrame with sales data
data = {
    'State': ['California', 'Texas', 'New York', 'Florida', 'Illinois'],
    'Sales': [120000, 95000, 110000, 88000, 72000],
    'Quarter': ['Q1', 'Q1', 'Q2', 'Q2', 'Q3']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Update all values in the Sales column
df.loc[:, 'Sales'] = df['Sales'] * 1.1  # Increase all sales by 10%

print("\nDataFrame after updating all Sales values:")
print(df)

Output:

DataFrame after updating all Sales values:
        State     Sales Quarter
0  California  132000.0      Q1
1       Texas  104500.0      Q1
2    New York  121000.0      Q2
3     Florida   96800.0      Q2
4    Illinois   79200.0      Q3

I executed the above example code and added the screenshot below

In this example, I’ve updated all values in the ‘Sales’ column by increasing them by 10%. The loc[] method with a colon (:) selects all rows, and ‘Sales’ specifies the column to update.

Read Fix “Function Not Implemented for This Dtype” Error in Python

Method 2: Conditional Updates Using loc[]

Often, I need to update only specific values that meet certain conditions. The loc[] method in Python is perfect for this as well:

# Reset our DataFrame
data = {
    'State': ['California', 'Texas', 'New York', 'Florida', 'Illinois'],
    'Sales': [120000, 95000, 110000, 88000, 72000],
    'Quarter': ['Q1', 'Q1', 'Q2', 'Q2', 'Q3']
}
df = pd.DataFrame(data)

# Update sales values only for Q1
df.loc[df['Quarter'] == 'Q1', 'Sales'] = df.loc[df['Quarter'] == 'Q1', 'Sales'] * 1.15

print("DataFrame after updating Q1 Sales values:")
print(df)

Output:

DataFrame after updating Q1 Sales values:
        State     Sales Quarter
0  California  138000.0      Q1
1       Texas  109250.0      Q1
2    New York  110000.0      Q2
3     Florida   88000.0      Q2
4    Illinois   72000.0      Q3

I executed the above example code and added the screenshot below

Here, I’ve applied a 15% increase only to sales from Q1. This is extremely useful when you need to make targeted updates based on specific conditions.

Method 3: Use replace() Method

Python’s replace() method is handy when you want to replace specific values in a column:

# Create a new DataFrame
data = {
    'State': ['California', 'Texas', 'New York', 'Florida', 'Illinois'],
    'Region': ['West', 'South', 'Northeast', 'South', 'Midwest']
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Replace 'South' with 'Southern'
df['Region'] = df['Region'].replace('South', 'Southern')

print("\nDataFrame after replacing values:")
print(df)

You can also replace multiple values at once:

# Replace multiple values
df['Region'] = df['Region'].replace({
    'West': 'Western',
    'Northeast': 'Eastern',
    'Midwest': 'Central'
})

print("\nDataFrame after replacing multiple values:")
print(df)

The replace() method is particularly useful when dealing with categorical data or when you need to standardize values.

Read Convert DataFrame To NumPy Array Without Index in Python

Method 4: Use apply() Function for Complex Updates

When I need to apply a custom function to update values, the apply() method in Python is my go-to solution:

import pandas as pd

# Create a DataFrame with product prices
data = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Monitor', 'Keyboard'],
    'Price': [1200, 800, 350, 250, 80]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Define a function to apply discount based on price
def apply_discount(price):
    if price > 1000:
        return price * 0.85  # 15% discount
    elif price > 500:
        return price * 0.9   # 10% discount
    elif price > 200:
        return price * 0.95  # 5% discount
    else:
        return price         # No discount

# Apply the function to the Price column
df['Price'] = df['Price'].apply(apply_discount)

print("\nDataFrame after applying discounts:")
print(df)

Output:

Original DataFrame:
      Product  Price
0      Laptop   1200
1  Smartphone    800
2      Tablet    350
3     Monitor    250
4    Keyboard     80

DataFrame after applying discounts:
      Product   Price
0      Laptop  1020.0
1  Smartphone   720.0
2      Tablet   332.5
3     Monitor   237.5
4    Keyboard    80.0

I executed the above example code and added the screenshot below

This method is extremely powerful for complex transformations that can’t be expressed in a single line of code.

Check out Read a CSV to the dictionary using Pandas in Python

Method 5: Use assign() for Creating Updated Columns

Python’s assign() method allows you to create a new DataFrame with updated columns:

# Create a DataFrame with temperature data
data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Temp_F': [75, 85, 68, 90, 105]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Create a new DataFrame with a converted temperature column
df_updated = df.assign(Temp_C=lambda x: (x['Temp_F'] - 32) * 5/9)

print("\nUpdated DataFrame with new column:")
print(df_updated)

This approach is useful when you want to preserve the original DataFrame and create a new one with the updates.

Method 6: Update Using numpy.where() for Conditional Logic

For complex conditional updates, combining Pandas with NumPy’s where() function is extremely effective:

import numpy as np

# Create a DataFrame with student scores
data = {
    'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
    'Score': [85, 92, 78, 65, 89]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Update grades based on scores
df['Grade'] = np.where(df['Score'] >= 90, 'A', 
               np.where(df['Score'] >= 80, 'B',
               np.where(df['Score'] >= 70, 'C',
               np.where(df['Score'] >= 60, 'D', 'F'))))

print("\nDataFrame with grades:")
print(df)

This method allows for multiple conditions and outcomes in a concise format.

Method 7: Update Multiple Columns Using loc[]

Sometimes I need to update multiple columns at once. The loc[] method in Python works great for this too:

# Create a DataFrame with product data
data = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Monitor', 'Keyboard'],
    'Price': [1200, 800, 350, 250, 80],
    'Stock': [10, 25, 15, 8, 30]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Update both price and stock for specific products
df.loc[df['Product'].isin(['Laptop', 'Smartphone']), ['Price', 'Stock']] = [[1100, 15], [750, 35]]

print("\nDataFrame after updating multiple columns:")
print(df)

This is particularly useful when you need to update related fields simultaneously.

Method 8: Use update() Method

Python’s update() method can be used to update a DataFrame with values from another DataFrame:

# Create our main DataFrame
data1 = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Monitor', 'Keyboard'],
    'Price': [1200, 800, 350, 250, 80]
}
df1 = pd.DataFrame(data1)

# Create a DataFrame with updated prices
data2 = {
    'Product': ['Laptop', 'Smartphone'],
    'Price': [1150, 780]
}
df2 = pd.DataFrame(data2)

# Set the index to Product for both DataFrames
df1.set_index('Product', inplace=True)
df2.set_index('Product', inplace=True)

print("Original DataFrame:")
print(df1)

# Update df1 with values from df2
df1.update(df2)

print("\nUpdated DataFrame:")
print(df1.reset_index())

This method is particularly useful when you have a separate dataset with updates that you want to apply to your main DataFrame.

I hope you found these methods helpful for updating column values in Python Pandas DataFrames. Each approach has its strengths, and the best one to use depends on your specific needs and the complexity of your data manipulation task.

Using Pandas is its flexibility, you can choose from multiple approaches to achieve the same result, allowing you to pick the one that’s most readable and efficient for your particular situation.

Update Column Values in Python Pandas DataFrame

Method 1: Update Column Values Using loc[]

Method 2: Conditional Updates Using loc[]

Method 3: Use replace() Method

Method 4: Use apply() Function for Complex Updates

Method 5: Use assign() for Creating Updated Columns

Method 6: Update Using numpy.where() for Conditional Logic

Method 7: Update Multiple Columns Using loc[]

Method 8: Use update() Method

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends