How to Use the Pandas Apply Function to Each Row

If you have been working with data in Python for a while, you know that the real magic happens when you start transforming your datasets.

In my years of experience as a developer, I have often found myself needing to perform a calculation or a logic check that goes beyond simple column addition.

This is exactly where the Pandas apply() function becomes your best friend. It allows you to take a custom function and run it across every single row of your DataFrame.

Whether you are calculating tax rates for different US states or cleaning up customer names, apply() is a versatile tool that I use almost daily.

In this tutorial, I will show you exactly how to use the apply() function on each row, along with some more efficient alternatives I’ve picked up over the years.

This Tutorial Covers:

The Basic Syntax of Pandas Apply on Rows

Before we dive into the complex examples, let’s look at the basic structure.

When you want to apply something to a row, you must specify axis=1. By default, Pandas looks at columns (axis=0), which is a mistake I see beginners make all the time.

Here is the general way I write it: df.apply(your_function, axis=1)

Method 1: Use Apply with a Lambda Function

This is my go-to method when the logic is simple enough to fit on one line. It saves me from defining a separate function.

Let’s look at a practical example. Suppose we have a list of employees in a US-based tech firm, and we need to calculate their year-end bonus based on their current salary and performance rating.

import pandas as pd

# Creating a dataset of US employees
data = {
    'Employee_Name': ['Alice Johnson', 'Bob Smith', 'Charlie Davis', 'Diana Prince'],
    'State': ['NY', 'CA', 'TX', 'FL'],
    'Salary': [95000, 110000, 85000, 120000],
    'Rating': [5, 3, 4, 2]
}

df = pd.DataFrame(data)

# I use a lambda function here to calculate a 10% bonus for high performers
df['Bonus'] = df.apply(lambda row: row['Salary'] * 0.10 if row['Rating'] > 3 else 0, axis=1)

print(df)

You can see the output in the screenshot below.

In this code, the lambda function acts as a temporary function. It looks at each row, checks the Rating, and does the math.

Method 2: Define a Custom Function for Complex Logic

Sometimes, business logic gets messy. If I have to check multiple conditions, like US tax brackets or regional shipping rules, a lambda function becomes hard to read.

In these cases, I prefer defining a proper Python function and passing it to apply().

Imagine we are calculating the total price of an order, including state-specific sales tax for New York (4%), California (7.25%), and Texas (6.25%).

import pandas as pd

# Sample US Sales Data
sales_data = {
    'OrderID': [101, 102, 103, 104],
    'State': ['NY', 'CA', 'TX', 'WA'],
    'Price': [200, 150, 300, 100]
}

df_sales = pd.DataFrame(sales_data)

# I define a function to handle the logic
def calculate_total_tax(row):
    state = row['State']
    price = row['Price']
    
    if state == 'NY':
        return price * 1.04
    elif state == 'CA':
        return price * 1.0725
    elif state == 'TX':
        return price * 1.0625
    else:
        return price  # No tax for other states in this example

# Apply the custom function
df_sales['Total_With_Tax'] = df_sales.apply(calculate_total_tax, axis=1)

print(df_sales)

You can see the output in the screenshot below.

Use the Pandas Apply Function to Each Row

This approach keeps your code clean. It is much easier for your teammates (or your future self) to debug a named function than a complex lambda string.

Method 3: Pass Multiple Arguments to Apply

There are times when your function needs more information than just what is in the row. You might have a global discount rate or a fixed overhead cost.

I use the args parameter in apply() to pass these extra variables.

import pandas as pd

df_products = pd.DataFrame({
    'Product': ['Laptop', 'Monitor', 'Keyboard'],
    'Base_Price': [1200, 300, 50]
})

def apply_discount(row, discount_percentage, shipping_fee):
    discounted_price = row['Base_Price'] * (1 - discount_percentage)
    return discounted_price + shipping_fee

# Here I pass a 15% discount and a $10 flat shipping fee
df_products['Final_Price'] = df_products.apply(
    apply_discount, 
    axis=1, 
    args=(0.15, 10)
)

print(df_products)

You can see the output in the screenshot below.

How to Use the Pandas Apply Function to Each Row

By using args, you make your functions reusable across different datasets.

Method 4: Use the Result_type Parameter

When you want the apply() function to return more than one column, things can get a bit tricky.

In my experience, using result_type=’expand’ is the most efficient way to turn the output into new columns immediately.

Suppose we want to split a “Full Name” column into “First Name” and “Last Name” for a US voter registration list.

import pandas as pd

df_voters = pd.DataFrame({
    'Full_Name': ['John Doe', 'Jane Smith', 'Samuel Wilson']
})

def split_names(row):
    first, last = row['Full_Name'].split(' ')
    return first, last

# Using expand to create two new columns at once
df_voters[['First_Name', 'Last_Name']] = df_voters.apply(split_names, axis=1, result_type='expand')

print(df_voters)

This prevents you from having to run apply() twice, which saves a lot of processing time on large files.

When Should You Avoid Using Apply?

This is a professional tip that I cannot stress enough: apply() is essentially a for loop under the hood.

If you are working with millions of rows, apply() will be slow. Very slow.

Whenever possible, I try to use Vectorized Operations. These are built-in Pandas and NumPy functions that run much faster because they operate on entire arrays at once.

For example, instead of using apply() to add two columns: df[‘Total’] = df.apply(lambda x: x[‘A’] + x[‘B’], axis=1) # SLOW

You should do: df[‘Total’] = df[‘A’] + df[‘B’] # FAST

However, for complex logic involving strings or custom Python libraries, apply() remains the most readable and straightforward choice.

Performance Comparison: Apply vs. Vectorization

To give you an idea of why this matters, I once worked on a project involving US Census data with over 5 million records.

Using apply() took nearly 2 minutes to categorize income levels. By switching to a vectorized NumPy select() statement, the time dropped to under 2 seconds.

If you find your script is hanging, check if you can replace your row-wise apply() with a vectorized alternative.

Practical Example: Categorize US Zip Codes

Let’s put everything together with a real-world scenario. We have a list of Zip Codes, and we want to categorize them by Region.

import pandas as pd

data = {
    'ZipCode': [10001, 90210, 60601, 75201, 30301],
    'City': ['New York', 'Beverly Hills', 'Chicago', 'Dallas', 'Atlanta']
}

df_zip = pd.DataFrame(data)

def get_region(row):
    zip_prefix = int(str(row['ZipCode'])[0])
    
    if zip_prefix in [0, 1, 2]:
        return 'East Coast'
    elif zip_prefix in [8, 9]:
        return 'West Coast'
    elif zip_prefix in [4, 5, 6]:
        return 'Midwest'
    elif zip_prefix in [3, 7]:
        return 'South'
    else:
        return 'Other'

df_zip['Region'] = df_zip.apply(get_region, axis=1)

print(df_zip)

In this example, we used string slicing and integer conversion inside a custom function to create a new categorical column.

Troubleshoot Common Errors

I often see developers get frustrated with a few specific errors when using apply().

1. “KeyError” inside the function

This usually happens because you forgot axis=1. Without it, Pandas passes a Column to your function instead of a Row, and it can’t find the other column names you are referencing.

2. “AttributeError: ‘float’ object has no attribute ‘split'”

If you are working with string columns that have missing data (NaN), your function will break. I always recommend adding a check for null values or using str(row[‘Column’]) to be safe.

3. Slow execution on Large DataFrames

As mentioned before, if your code is taking too long, look into the swifter library. It’s a tool I use that automatically decides if an apply() can be vectorized or parallelized to speed it up.

Summary of Best Practices

Through my years of using Pandas, I’ve developed a mental checklist for applying functions:

Use Vectorization first: If a basic math operator or NumPy function can do it, don’t use apply().
Use Axis=1: Always remember this when you need to access multiple columns in a single row.
Keep functions pure: Try to avoid changing global variables inside your applied function; it makes debugging a nightmare.
Use Progress Bars: If you have a long-running apply(), I recommend using tqdm. You can run df.progress_apply() to see exactly how much time is left.

The Pandas apply() function is an incredibly powerful tool for any data professional. While it may not always be the fastest option for massive datasets, its flexibility and ease of use make it a staple in my data cleaning and transformation workflows.

I hope this guide helps you understand how to use apply() effectively in your own projects. Whether you are dealing with financial data, customer lists, or US-specific regional analysis, mastering this function will significantly improve your efficiency as a Python developer.

How to Use the Pandas Apply Function to Each Row

The Basic Syntax of Pandas Apply on Rows

Method 1: Use Apply with a Lambda Function

Method 2: Define a Custom Function for Complex Logic

Method 3: Pass Multiple Arguments to Apply

Method 4: Use the Result_type Parameter

When Should You Avoid Using Apply?

Performance Comparison: Apply vs. Vectorization

Practical Example: Categorize US Zip Codes

Troubleshoot Common Errors

1. “KeyError” inside the function

2. “AttributeError: ‘float’ object has no attribute ‘split'”

3. Slow execution on Large DataFrames

Summary of Best Practices

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends