Pandas Iterrows() Method

While I was working on a data analysis project, I needed to process each row in a DataFrame individually. I found myself using the iterrows() method frequently, it’s one of those fundamental Pandas tools that every data analyst should know. While there are several ways to iterate through a DataFrame, iterrows() offers a simple approach for many common tasks.

In this tutorial, I’ll cover everything you need to know about using iterrows() in Pandas, including best practices, performance considerations, and alternatives.

This Tutorial Covers:

Pandas iterrows()

The iterrows() method is a built-in Pandas function that allows you to iterate through each row of a DataFrame. It returns an iterator yielding index and row pairs, making it easy to access both the row’s position and its data.

Think of iterrows() as a way to process your DataFrame one row at a time, similar to how you might use a for loop to go through items in a list.

Here’s a basic example of how iterrows() works:

import pandas as pd

# Create a simple DataFrame with US state data
data = {
    'State': ['California', 'Texas', 'Florida', 'New York'],
    'Population': [39.51, 29.00, 21.48, 19.45],
    'Capital': ['Sacramento', 'Austin', 'Tallahassee', 'Albany']
}

df = pd.DataFrame(data)

# Use iterrows() to iterate through each row
for index, row in df.iterrows():
    print(f"Row {index}: {row['State']} has a population of {row['Population']} million.")

This code outputs:

Row 0: California has a population of 39.51 million.
Row 1: Texas has a population of 29.0 million.
Row 2: Florida has a population of 21.48 million.
Row 3: New York has a population of 19.45 million.

Use iterrows() in Pandas

Let me show you some practical examples of using iterrows() in real-world scenarios.

Read Pandas Dataframe drop() Function in Python

Example 1: Perform Calculations on Rows

You can use iterrows() in Python Pandas to perform row-wise calculations, such as computing commissions based on total sales data.

import pandas as pd

# Sales data for a US retail company
data = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Headphones'],
    'Price': [1200, 800, 400, 150],
    'Units_Sold': [5, 10, 8, 25]
}

df = pd.DataFrame(data)

# Calculate total sales and commission
df['Total_Sales'] = df['Price'] * df['Units_Sold']

# Use iterrows() to calculate and print commission (5% of sales)
for index, row in df.iterrows():
    commission = row['Total_Sales'] * 0.05
    print(f"Product: {row['Product']}, Commission: ${commission:.2f}")

Output:

Product: Laptop, Commission: $300.00
Product: Smartphone, Commission: $400.00
Product: Tablet, Commission: $160.00
Product: Headphones, Commission: $187.50

I executed the above example code and added the screenshot below.

This method is useful for applying custom logic to each row, especially when working with financial or transactional datasets.

Example 2: Conditional Processing

You can use iterrows() to apply custom logic to each row based on multiple conditions, such as weather thresholds.

import pandas as pd

# Weather data for US cities
weather_data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Miami', 'Denver'],
    'Temperature': [45, 75, 32, 85, 60],
    'Humidity': [65, 45, 70, 80, 30],
    'Precipitation': [0.2, 0.0, 0.4, 0.3, 0.1]
}

df = pd.DataFrame(weather_data)

# Process each row based on conditions
for index, row in df.iterrows():
    if row['Temperature'] > 70:
        condition = "Warm"
    elif row['Temperature'] < 40:
        condition = "Cold"
    else:
        condition = "Moderate"

    if row['Precipitation'] > 0.2:
        rain_status = "Expect rain"
    else:
        rain_status = "Mostly dry"

    print(f"{row['City']}: {condition}, {rain_status}")

Output:

New York: Moderate, Mostly dry
Los Angeles: Warm, Mostly dry
Chicago: Cold, Expect rain
Miami: Warm, Expect rain
Denver: Moderate, Mostly dry

I executed the above example code and added the screenshot below.

This approach is effective for generating descriptive labels or alerts from raw data using condition-based processing.

Check out Create Pandas Crosstab Percentage in Python

Example 3: Modify DataFrame Values

You can also use iterrows() to make changes to your DataFrame, although this isn’t the most efficient approach (more on that later):

import pandas as pd

# Employee data
data = {
    'Name': ['John Smith', 'Sarah Johnson', 'Michael Brown', 'Jessica Davis'],
    'Department': ['Sales', 'Marketing', 'IT', 'HR'],
    'Salary': [65000, 72000, 88000, 62000]
}

df = pd.DataFrame(data)

# Give a 10% raise to employees in IT department
for index, row in df.iterrows():
    if row['Department'] == 'IT':
        df.at[index, 'Salary'] = row['Salary'] * 1.10

print(df)

Output:

            Name Department   Salary
0     John Smith      Sales  65000.0
1  Sarah Johnson  Marketing  72000.0
2  Michael Brown         IT  96800.0
3  Jessica Davis         HR  62000.0

I executed the above example code and added the screenshot below.

Using iterrows() lets you modify DataFrame values based on custom conditions, but vectorized methods are generally more efficient for larger datasets.

Performance Considerations

While iterrows() is convenient, it’s important to understand its performance limitations. When working with large DataFrames, iterrows() can be significantly slower than other methods.

The main reason for this is that iterrows() returns each row as a Series object, which has some overhead. Additionally, any changes made to the DataFrame require individual assignments, which is inefficient for bulk operations.

Here’s a performance comparison:

import pandas as pd
import numpy as np
import time

# Create a large DataFrame
large_df = pd.DataFrame({
    'A': np.random.rand(100000),
    'B': np.random.rand(100000)
})

# Method 1: Using iterrows() (slow)
start_time = time.time()
result = 0
for index, row in large_df.iterrows():
    result += row['A'] * row['B']
print(f"iterrows() time: {time.time() - start_time:.4f} seconds")

# Method 2: Using vectorized operations (fast)
start_time = time.time()
result = (large_df['A'] * large_df['B']).sum()
print(f"Vectorized operation time: {time.time() - start_time:.4f} seconds")

Alternatives to iterrows()

Because of the performance limitations, here are some better alternatives to iterrows() depending on your use case:

1. Vectorized Operations

For mathematical operations, always prefer vectorized operations:

# Instead of:
for index, row in df.iterrows():
    df.at[index, 'C'] = row['A'] + row['B']

# Do this:
df['C'] = df['A'] + df['B']

Read Drop the Unnamed Column in Pandas DataFrame

2. apply() Method

The apply() method can be more efficient than iterrows() for row-wise operations:

# Using apply() on rows
df['full_name'] = df.apply(lambda row: row['first_name'] + ' ' + row['last_name'], axis=1)

3. itertuples()

If you need to iterate through rows, itertuples() is faster than iterrows() because it returns named tuples instead of Series objects:

for row in df.itertuples():
    print(row.Name, row.Department, row.Salary)

4. loc or iloc for Batch Operations

For updating values based on conditions, use loc or iloc:

# Instead of:
for index, row in df.iterrows():
    if row['Department'] == 'IT':
        df.at[index, 'Salary'] = row['Salary'] * 1.10

# Do this:
df.loc[df['Department'] == 'IT', 'Salary'] *= 1.10

While iterrows() has its place in the Pandas toolkit, it’s important to choose the right tool for each job. For small DataFrames or one-off operations, iterrows() is perfectly fine.

But for working with larger datasets or performance-critical applications, consider using the alternatives I’ve outlined.

Pandas iterrows() Method

Pandas iterrows()

Use iterrows() in Pandas

Example 1: Perform Calculations on Rows

Example 2: Conditional Processing

Example 3: Modify DataFrame Values

Performance Considerations

Alternatives to iterrows()

1. Vectorized Operations

2. apply() Method

3. itertuples()

4. loc or iloc for Batch Operations

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends