Set Column Names in Pandas

When I first started working with large datasets in Python, I often found myself staring at messy, inconsistent column headers.

It is a common headache, especially when you are pulling data from various sources that don’t follow a standard naming convention.

In my years of experience as a Python developer, I have realized that clean column names are the foundation of readable and maintainable code.

In this tutorial, I will show you exactly how to set column names in Pandas using the same methods I use in my daily data science projects.

This Tutorial Covers:

Set Column Names While Reading a CSV

One of the most efficient ways I’ve found to handle column names is to set them the moment you load the data.

Suppose we are working with a dataset of popular tech companies based in the USA, but the CSV file doesn’t have a header row.

Instead of loading it first and then fixing it, you can define the names immediately using the names parameter.

import pandas as pd

# List of tech companies with Headcount and HQ Location
data = [
    ['Apple', 161000, 'Cupertino, CA'],
    ['Microsoft', 221000, 'Redmond, WA'],
    ['Google', 190000, 'Mountain View, CA'],
    ['Amazon', 1541000, 'Seattle, WA']
]

# Saving a dummy CSV without headers for this example
df_temp = pd.DataFrame(data)
df_temp.to_csv('usa_tech_firms.csv', index=False, header=False)

# Setting column names while reading the file
col_names = ['Company_Name', 'Employee_Count', 'HQ_Location']
df = pd.read_csv('usa_tech_firms.csv', names=col_names)

print(df)

I executed the above example code and added the screenshot below.

In the code above, I used the names argument to assign meaningful headers right at the start.

Use the columns Attribute to Overwrite All Headers

When I need to replace every single column name in a DataFrame, I find it easiest to use the .columns attribute.

This method is easy, but you must ensure that your new list of names has the same length as the existing columns.

Let’s look at a dataset of US real estate prices where we want to simplify the headers.

import pandas as pd

# Initial data with complex headers
data = {
    'Property_Identification_Number': [101, 102, 103],
    'Market_Value_USD_2024': [450000, 820000, 310000],
    'State_Location_Code': ['TX', 'NY', 'FL']
}

df = pd.DataFrame(data)

# Overwriting all column names with a simple list
df.columns = ['ID', 'Price', 'State']

print("Updated DataFrame:")
print(df)

I executed the above example code and added the screenshot below.

I usually prefer this method when I am creating a DataFrame from scratch or when the original names are completely unusable.

Rename Specific Columns with the rename() Function

In my experience, the rename() function is the most flexible tool in the Pandas library for modifying headers.

It allows you to change only the specific columns you care about without worrying about the others.

This is particularly useful when you have a massive DataFrame with 50 columns and you only need to fix two or three.

import pandas as pd

# Dataset of USA National Parks
data = {
    'Park': ['Yellowstone', 'Yosemite', 'Zion'],
    'EST': [1872, 1890, 1919],
    'SQ_MI': [3468, 1169, 229]
}

df = pd.DataFrame(data)

# Using a dictionary to map old names to new names
df.rename(columns={'EST': 'Year_Established', 'SQ_MI': 'Area_Sq_Miles'}, inplace=True)

print(df)

I executed the above example code and added the screenshot below.

Note that I used inplace=True. In my earlier days, I often forgot this, which meant my changes weren’t saved to the original DataFrame.

If you don’t use inplace=True, you must assign the result back to a variable, like df = df.rename(…).

Use a Function to Modify Column Names

Sometimes you don’t want to change specific names, but rather apply a rule to all of them.

For example, I often receive datasets where headers are a mix of uppercase and lowercase letters, or they contain spaces.

I find it very helpful to use the str accessor to clean these up in one go.

import pandas as pd

# USA Car Sales Data with inconsistent naming
data = {
    'CAR MODEL': ['Ford F-150', 'Tesla Model 3', 'Chevrolet Silverado'],
    'Units Sold': [750000, 240000, 520000],
    'MANUFACTURER': ['Ford', 'Tesla', 'GM']
}

df = pd.DataFrame(data)

# Converting all names to lowercase and replacing spaces with underscores
df.columns = df.columns.str.lower().str.replace(' ', '_')

print(df.columns)

I executed the above example code and added the screenshot below.

This simple trick saves me a lot of time when I’m prepping data for a machine learning model.

Add a Prefix or Suffix to Column Names

There are times when I’m merging two different datasets, for example, 2023 sales data and 2024 sales data.

To keep track of which column belongs to which year, I like to add a suffix or a prefix.

Pandas provides built-in methods called add_prefix() and add_suffix() that make this incredibly easy.

import pandas as pd

# US Quarterly Revenue Data
data = {
    'Q1': [50000, 60000],
    'Q2': [55000, 62000]
}

df = pd.DataFrame(data)

# Adding a suffix to identify the year
df = df.add_suffix('_2024')

print(df)

This prevents column name collisions when you perform a join or a merge later in your script.

Change Column Names During an Aggregation

When I use the groupby function in Pandas, the resulting columns often end up with repetitive or confusing names.

I’ve found that using “Named Aggregation” is the best way to keep your columns clean during this process.

Let’s look at a dataset of US flight delays grouped by the airline.

import pandas as pd

data = {
    'Airline': ['Delta', 'Delta', 'United', 'United', 'American'],
    'Delay_Min': [15, 30, 10, 50, 20]
}

df = pd.DataFrame(data)

# Grouping and setting new column names simultaneously
summary = df.groupby('Airline').agg(
    Average_Delay=('Delay_Min', 'mean'),
    Total_Flights=('Delay_Min', 'count')
)

print(summary)

This approach is much cleaner than performing the aggregation and then renaming the columns in a separate step.

Handle Multi-Index Columns

Dealing with Multi-Index (hierarchical) columns can be tricky, even for experienced developers.

Usually, these occur when you use multiple functions in a groupby operation without named aggregation.

To flatten these and set simple names, I use a join technique that I’ve refined over many projects.

import pandas as pd

# Creating a Multi-Index DataFrame
data = {
    'State': ['NY', 'NY', 'CA', 'CA'],
    'City': ['NYC', 'Buffalo', 'LA', 'SF'],
    'Pop': [8000000, 250000, 3800000, 800000]
}

df = pd.DataFrame(data)
multi_df = df.groupby(['State']).agg({'Pop': ['mean', 'max']})

# Flattening the Multi-Index and setting new names
multi_df.columns = ['_'.join(col).strip() for col in multi_df.columns.values]

print(multi_df)

This turns “Pop mean” and “Pop max” into “Pop_mean” and “Pop_max”, which are much easier to work with.

Reorder Columns While Renaming

While not strictly “setting” names, I often need to reorder my columns at the same time I’m renaming them.

You can do this easily by passing a list of the names in the desired order to the DataFrame.

import pandas as pd

# US Employee Data
data = {
    'Salary': [95000, 105000],
    'Name': ['Alice Smith', 'Bob Jones'],
    'Dept': ['Engineering', 'Marketing']
}

df = pd.DataFrame(data)

# Reordering and selecting specific columns
df = df[['Name', 'Dept', 'Salary']]

print(df)

I find this particularly useful before exporting a final report to a stakeholder in the USA, as they usually expect the “Name” or “ID” column to be first.

I hope you found this tutorial useful.

In this guide, we covered how to set column names during data import, how to rename specific headers using dictionaries, and how to apply bulk changes with string methods.

Setting clear and concise column names is a small step that makes a huge difference in the quality of your data analysis.

Ways to Set Column Names in Pandas

Set Column Names While Reading a CSV

Use the columns Attribute to Overwrite All Headers

Rename Specific Columns with the rename() Function

Use a Function to Modify Column Names

Add a Prefix or Suffix to Column Names

Change Column Names During an Aggregation

Handle Multi-Index Columns

Reorder Columns While Renaming

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends