How to Merge Two Columns in Pandas

Merging columns is one of those tasks I find myself doing in almost every data project I tackle.

Whether I’m cleaning up a messy CRM export or preparing a dataset for a machine learning model, combining strings is essential.

In this tutorial, I will show you exactly how to merge two columns in a Pandas DataFrame using several different methods.

This Tutorial Covers:

Create a Sample US-Based Dataset

To make this practical, let’s use a dataset involving US employees, their locations, and departments.

I’ll create a DataFrame with “First Name,” “Last Name,” and “State” to demonstrate how we can combine these fields.

import pandas as pd

# Creating a sample dataset of US professionals
data = {
    'First_Name': ['James', 'Mary', 'Robert', 'Patricia', 'John'],
    'Last_Name': ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones'],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'State': ['NY', 'CA', 'IL', 'TX', 'AZ']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Method 1: Use the Plus (+) Operator

The simplest way I’ve found to merge two columns is by using the plus operator.

It works just like regular string concatenation in Python, provided both columns are of the string (object) type.

In this example, I’ll merge the ‘First_Name’ and ‘Last_Name’ columns to create a ‘Full_Name’ column.

# Merging First_Name and Last_Name with a space in between
df['Full_Name'] = df['First_Name'] + ' ' + df['Last_Name']

print(df[['First_Name', 'Last_Name', 'Full_Name']])

You can see the output in the screenshot below.

I prefer this method for quick tasks because it is readable and very intuitive for anyone familiar with basic Python.

One thing to watch out for is NaN values; if one column has a null value, the result of the addition will also be NaN.

Method 2: Use the str.cat() Function

When I need more control over how columns are joined, I reach for the str.cat() method.

This method is specifically designed for string series and allows you to define a separator easily.

Let’s use this to combine ‘City’ and ‘State’ into a standard US address format (City, State).

# Merging City and State using str.cat
df['Location'] = df['City'].str.cat(df['State'], sep=', ')

print(df[['City', 'State', 'Location']])

You can see the output in the screenshot below.

The beauty of str.cat() is how it handles missing data. You can use the na_rep argument to tell Pandas what to put there if a value is missing.

I find this much cleaner than manually filling NaNs before concatenation.

Method 3: Use the apply() Function with Lambda

Sometimes, you need to merge columns that aren’t just strings, or you need to apply complex logic during the merge.

In those cases, I use the apply() function. It’s slightly slower on massive datasets, but the flexibility is unmatched.

Suppose we want to create a formal label that looks like “Last Name, First Name”.

# Using lambda to format the name formally
df['Formal_Name'] = df.apply(lambda row: f"{row['Last_Name']}, {row['First_Name']}", axis=1)

print(df[['Formal_Name']])

You can see the output in the screenshot below.

Using f-strings inside a lambda function is my go-to trick for creating highly customized column formats.

Setting axis=1 is crucial here because it tells Pandas to apply the function row-wise rather than column-wise.

Method 4: Merge Multiple Columns Using join()

If you have a list of several columns you want to squash into one, adding them one by one with a + sign is tedious.

In my experience, using agg() or apply() with ‘ ‘.join is the most efficient way to handle three or more columns.

Let’s combine the First Name, Last Name, and State into a single “Bio” string.

# Merging three columns into one bio string
cols_to_merge = ['First_Name', 'Last_Name', 'State']
df['User_Bio'] = df[cols_to_merge].apply(lambda x: ' - '.join(x.values), axis=1)

print(df['User_Bio'])

This approach is dynamic. You can pass any list of column names, and it will join them all with the specified delimiter.

Method 5: Use the map() and format() Functions

For those who prefer the classic Python .format() syntax, you can use map() to achieve a merge.

I often use this when I’m dealing with numeric data that needs to be formatted as a string during the merge.

Imagine we have a “Salary” column and we want to merge it with the name for a report.

df['Salary'] = [95000, 105000, 88000, 120000, 92000]

# Merging Name and Salary into a report string
df['Salary_Report'] = df.apply(lambda x: "Employee {} earns ${:,}".format(x['Full_Name'], x['Salary']), axis=1)

print(df['Salary_Report'])

The {:,} inside the format string adds a thousands-separator, which is a common requirement for US financial reporting.

Handle Common Issues: Numeric to String

A mistake I see beginners make often is trying to merge a string column with a numeric column using the + operator.

This will throw a TypeError. You must convert the numeric column to a string first.

# Correct way to merge string and numeric columns
df['ID'] = [101, 102, 103, 104, 105]

# df['Label'] = df['First_Name'] + df['ID']  <-- This would fail!
df['Label'] = df['First_Name'] + "_" + df['ID'].astype(str)

print(df[['First_Name', 'ID', 'Label']])

I always use .astype(str) to ensure the data types are compatible before I start merging.

Which Method Should You Choose?

In my years of developing with Pandas, I’ve learned that the “best” method depends on your specific needs.

If you are just joining two simple strings, the Plus (+) Operator is the fastest and easiest to read.

If you need to handle null values gracefully, str.cat() is the professional’s choice.

For complex formatting or merging multiple data types, apply() with a lambda is the most powerful tool in your kit.

If you are working with very large datasets (millions of rows), try to stay away from apply() and use vectorized operations like + or str.cat().

Vectorized operations are much faster because they run on the underlying C-engine of NumPy rather than looping in Python.

Practical Example: Clean US Phone Numbers

Often, US phone numbers come in three separate columns: Area Code, Exchange, and Line Number.

Let’s see how I would merge these into a standard US format: (XXX) XXX-XXXX.

phone_data = {
    'Area_Code': ['212', '310', '312', '713', '602'],
    'Exchange': ['555', '555', '555', '555', '555'],
    'Line': ['0101', '0102', '0103', '0104', '0105']
}
df_phone = pd.DataFrame(phone_data)

# Merging into (Area) Exchange-Line format
df_phone['Formatted_Phone'] = "(" + df_phone['Area_Code'] + ") " + df_phone['Exchange'] + "-" + df_phone['Line']

print(df_phone)

This is a classic data-cleaning task that highlights why knowing these merging techniques is so valuable.

Merging columns in Pandas is a fundamental skill that makes your data much easier to work with.

How to Merge Two Columns in Pandas

Create a Sample US-Based Dataset

Method 1: Use the Plus (+) Operator

Method 2: Use the str.cat() Function

Method 3: Use the apply() Function with Lambda

Method 4: Merge Multiple Columns Using join()

Method 5: Use the map() and format() Functions

Handle Common Issues: Numeric to String

Which Method Should You Choose?

Practical Example: Clean US Phone Numbers

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends