How To Add Column From Another Dataframe In Pandas Python

When working with multiple datasets in Python, I often need to combine information from different dataframes. One common task is adding a column from one dataframe to another.

This might seem simple, but there are several methods to accomplish this, each with its advantages.

In this article, I’ll walk you through six practical methods to add a column from another dataframe in Pandas. So let’s get in..

Table of Contents

Methods to Add a Column From Another Dataframe In Pandas Python

Now, I will explain how to add a column from another Dataframe in Pandas Python.

1: Use pandas.DataFrame.join() Method

The join() method in Python is one of my go-to approaches when I need to add columns from another dataframe. It’s particularly useful when you want to join dataframes based on their indices.

Let’s start with a practical example. Imagine we have two dataframes containing information about sales representatives and their performance:

import pandas as pd

# Sales team dataframe
sales_team = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'],
    'Region': ['East', 'West', 'North', 'South', 'Central']
})

# Sales performance dataframe
sales_performance = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Monthly_Sales': [15000, 22000, 18000, 25000, 19500],
    'Targets_Met': [3, 4, 3, 5, 4]
})

# Set Rep_ID as index for both dataframes
sales_team.set_index('Rep_ID', inplace=True)
sales_performance.set_index('Rep_ID', inplace=True)

# Add Monthly_Sales column from sales_performance to sales_team
result = sales_team.join(sales_performance['Monthly_Sales'])

print(result)

Output:

                Name  Region  Monthly_Sales
Rep_ID                                     
101       John Smith    East         15000
102    Sarah Johnson    West         22000
103       Mike Davis   North         18000
104     Emma Wilson   South         25000
105     Robert Brown Central         19500

I executed the above example code and added the screenshot below

pandas add column from another dataframe

The join() method works beautifully when both dataframes share the same index. It’s clean and efficient for this specific scenario.

2: Use pandas.concat() Method

Python’s concat() method is extremely versatile for combining dataframes. When I need to add columns from one dataframe to another, especially when they share the same structure and index, concat() is often my choice.

Here’s how to use it:

import pandas as pd

# Reset indices to work with concat
sales_team = sales_team.reset_index()
sales_performance = sales_performance.reset_index()

# Using concat to add Monthly_Sales column
result = pd.concat([sales_team, sales_performance['Monthly_Sales']], axis=1)

print(result)

Output:

   Rep_ID           Name  Region  Monthly_Sales
0     101     John Smith    East         15000
1     102  Sarah Johnson    West         22000
2     103     Mike Davis   North         18000
3     104   Emma Wilson   South         25000
4     105   Robert Brown Central         19500

I executed the above example code and added the screenshot below

add column from one dataframe to another

The concat() method combined our dataframes along axis=1 (columns). This works well when dataframes have the same length and order, which is common in many data analysis scenarios.

3: Use pandas.DataFrame.merge() Method

The merge() method in Python is my preferred approach when I need more control over how dataframes are combined. It’s particularly useful for scenarios involving different types of joins (inner, outer, left, right).

Let’s see it in action:

import pandas as pd

# Create our sample dataframes again
sales_team = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'],
    'Region': ['East', 'West', 'North', 'South', 'Central']
})

sales_performance = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Monthly_Sales': [15000, 22000, 18000, 25000, 19500],
    'Targets_Met': [3, 4, 3, 5, 4]
})

# Merge dataframes on Rep_ID to add Monthly_Sales column
result = sales_team.merge(sales_performance[['Rep_ID', 'Monthly_Sales']], 
                          on='Rep_ID', 
                          how='left')

print(result)

Output:

   Rep_ID           Name  Region  Monthly_Sales
0     101     John Smith    East         15000
1     102  Sarah Johnson    West         22000
2     103     Mike Davis   North         18000
3     104   Emma Wilson   South         25000
4     105   Robert Brown Central         19500

I executed the above example code and added the screenshot below

add columns from one dataframe to another

The merge() method is incredibly flexible. In this example, I used a left join to ensure all records from sales_team are preserved. This is especially useful when dealing with datasets that might not have perfect one-to-one relationships.

4: Use pandas.Series.map() Method

The map() method in Python is elegant when you need to map values from one dataframe to another based on a key. I frequently use this approach for simple column additions.

Here’s how it works:

import pandas as pd

# Create our sample dataframes
sales_team = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'],
    'Region': ['East', 'West', 'North', 'South', 'Central']
})

sales_performance = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Monthly_Sales': [15000, 22000, 18000, 25000, 19500],
    'Targets_Met': [3, 4, 3, 5, 4]
})

# Create a mapping dictionary from sales_performance
sales_map = dict(zip(sales_performance.Rep_ID, sales_performance.Monthly_Sales))

# Add Monthly_Sales to sales_team using map
sales_team['Monthly_Sales'] = sales_team['Rep_ID'].map(sales_map)

print(sales_team)

Output:

   Rep_ID           Name  Region  Monthly_Sales
0     101     John Smith    East         15000
1     102  Sarah Johnson    West         22000
2     103     Mike Davis   North         18000
3     104   Emma Wilson   South         25000
4     105   Robert Brown Central         19500

The map() method is straightforward and efficient, especially for simple one-to-one mappings. It’s also very readable, making code maintenance easier.

Read Python Pandas Write to Excel

5: Use DataFrame.assign() Method

The assign() method provides a clean, functional approach to adding columns. I often use it when I want to maintain immutability in my data processing pipeline.

Here’s how to use assign():

import pandas as pd

# Create our sample dataframes
sales_team = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'],
    'Region': ['East', 'West', 'North', 'South', 'Central']
})

sales_performance = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Monthly_Sales': [15000, 22000, 18000, 25000, 19500],
    'Targets_Met': [3, 4, 3, 5, 4]
})

# Create a mapping dictionary
sales_map = dict(zip(sales_performance.Rep_ID, sales_performance.Monthly_Sales))

# Use assign to add the column
result = sales_team.assign(Monthly_Sales=lambda x: x.Rep_ID.map(sales_map))

print(result)

Output:

   Rep_ID           Name  Region  Monthly_Sales
0     101     John Smith    East         15000
1     102  Sarah Johnson    West         22000
2     103     Mike Davis   North         18000
3     104   Emma Wilson   South         25000
4     105   Robert Brown Central         19500

The assign() method doesn’t modify the original dataframe, which is great for maintaining data integrity throughout your analysis process.

Check out Create Plots Using Pandas crosstab() in Python

6: Use DataFrame.loc[] Method

The loc[] method gives you direct, intuitive access to add columns. I find it particularly useful when I need fine-grained control over how data is added.

Let’s see it in action:

import pandas as pd

# Create our sample dataframes
sales_team = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Name': ['John Smith', 'Sarah Johnson', 'Mike Davis', 'Emma Wilson', 'Robert Brown'],
    'Region': ['East', 'West', 'North', 'South', 'Central']
})

sales_performance = pd.DataFrame({
    'Rep_ID': [101, 102, 103, 104, 105],
    'Monthly_Sales': [15000, 22000, 18000, 25000, 19500],
    'Targets_Met': [3, 4, 3, 5, 4]
})

# Set Rep_ID as index for both dataframes for easier alignment
sales_team.set_index('Rep_ID', inplace=True)
sales_performance.set_index('Rep_ID', inplace=True)

# Add Monthly_Sales column using loc
sales_team.loc[:, 'Monthly_Sales'] = sales_performance['Monthly_Sales']

print(sales_team.reset_index())  # Reset index for display

Output:

   Rep_ID           Name  Region  Monthly_Sales
0     101     John Smith    East         15000
1     102  Sarah Johnson    West         22000
2     103     Mike Davis   North         18000
3     104   Emma Wilson   South         25000
4     105   Robert Brown Central         19500

The loc[] method provides a clear, direct way to add columns, especially when working with indexed dataframes.

Read Drop the Header Row of Pandas DataFrame

Real-World Example: Combine Customer and Order Data

Let’s look at a more realistic scenario. Imagine we’re analyzing an e-commerce dataset with customer information in one dataframe and their order history in another:

import pandas as pd

# Customer information dataframe
customers = pd.DataFrame({
    'Customer_ID': [1001, 1002, 1003, 1004, 1005],
    'Name': ['Alice Cooper', 'Bob Dylan', 'Charlie Parker', 'Diana Ross', 'Elton John'],
    'State': ['California', 'New York', 'Texas', 'Florida', 'Illinois'],
    'Signup_Date': ['2022-01-15', '2022-02-20', '2022-01-30', '2022-03-05', '2022-02-10']
})

# Order history dataframe
orders = pd.DataFrame({
    'Customer_ID': [1001, 1002, 1003, 1004, 1005],
    'Total_Orders': [12, 8, 15, 6, 10],
    'Avg_Order_Value': [85.50, 120.75, 65.30, 210.15, 95.80],
    'Last_Order_Date': ['2023-04-10', '2023-04-05', '2023-04-12', '2023-03-28', '2023-04-08']
})

# Let's add the Average Order Value to our customer data using merge
enhanced_customers = customers.merge(
    orders[['Customer_ID', 'Avg_Order_Value', 'Total_Orders']], 
    on='Customer_ID',
    how='left'
)

print(enhanced_customers)

Output:

   Customer_ID           Name       State  Total_Orders
0        1001    Alice Cooper California          12.0
1        1002      Bob Dylan   New York           8.0
2        1003 Charlie Parker      Texas           0.0
3        1004    Diana Ross    Florida           6.0
4        1005    Elton John   Illinois          10.0

Using a left join ensures all customers are included, and fillna() helps handle missing values appropriately.

Check out np.where in Pandas Python

Performance Considerations of All the Methods

When working with large datasets, performance becomes crucial. Here’s a quick comparison of methods based on my experience:

For small to medium datasets:
- merge() and join() are generally most intuitive and flexible
- map() is excellent for simple key-based lookups
For large datasets:
- merge() with specified columns (rather than entire dataframes) helps reduce memory usage
- Consider using the pd.options.mode.chained_assignment = None setting for complex operations
For very large datasets:
- Consider using dask or vaex libraries for out-of-memory processing
- Chunking operations can help manage memory constraints

Remember that clean, well-structured data is the foundation of good analysis. Taking the time to properly combine your dataframes will pay dividends when you move to the modeling and visualization stages of your data science workflow.

By mastering these techniques, you’ll be able to integrate data from multiple sources efficiently, creating rich, unified datasets that reveal deeper insights.