When I was working on a data analysis project for a US retail chain, I needed to merge sales data from different stores. The challenge was dealing with missing values (NaN) that appeared after the merge operation. These NaN values were causing calculation errors in my analysis.
In this article, I’ll share several practical ways to handle NaN values when merging DataFrames in pandas, with a focus on replacing them with zeros.
Let us see the solutions!
NaN Values in Pandas Merges
When you merge two DataFrames in Python pandas, you’ll often encounter NaN values in the result. These represent missing data that exists in one DataFrame but not in the other.
Let’s see this with a quick example using sales data:
import pandas as pd
import numpy as np
# Store A sales data
store_a = pd.DataFrame({
'product_id': [101, 102, 103, 104],
'product_name': ['Laptop', 'Smartphone', 'Tablet', 'Headphones'],
'sales': [5000, 3000, 2000, 1000]
})
# Store B sales data
store_b = pd.DataFrame({
'product_id': [102, 103, 105, 106],
'product_name': ['Smartphone', 'Tablet', 'Monitor', 'Keyboard'],
'sales': [2500, 1800, 1200, 500]
})
# Merge the DataFrames
merged_sales = pd.merge(store_a, store_b,
on=['product_id', 'product_name'],
how='outer',
suffixes=('_A', '_B'))
print(merged_sales)In the resulting DataFrame, you’ll notice NaN values where products were sold in one store but not the other.
Fill NaN Values with Zeros in Pandas DataFrames
Now, I will explain how to fill NaN values with zeros in pandas DataFrames.
Read Fix “Function Not Implemented for This Dtype” Error in Python
Method 1: Use fillna() to Replace NaN with Zeros
The simplest and most direct way to replace NaN values with zeros is to use the fillna() method in Python:
# Replace all NaN values with 0
merged_sales_filled = merged_sales.fillna(0)
print(merged_sales_filled)Output:
product_id product_name sales_A sales_B
0 101 Laptop 5000.0 0.0
1 102 Smartphone 3000.0 2500.0
2 103 Tablet 2000.0 1800.0
3 104 Headphones 1000.0 0.0
4 105 Monitor 0.0 1200.0
5 106 Keyboard 0.0 500.0You can refer to the screenshot below to see the output.

This approach is straightforward but replaces all NaN values in the DataFrame. Sometimes, you might want to be more selective.
Method 2: Replace NaN Values in Specific Columns
If you only want to replace NaN values in certain columns (like sales figures), you can specify them:
# Replace NaN values only in the sales columns
columns_to_fill = ['sales_A', 'sales_B']
merged_sales[columns_to_fill] = merged_sales[columns_to_fill].fillna(0)
print(merged_sales)Output:
product_id product_name sales_A sales_B
0 101 Laptop 5000.0 0.0
1 102 Smartphone 3000.0 2500.0
2 103 Tablet 2000.0 1800.0
3 104 Headphones 1000.0 0.0
4 105 Monitor 0.0 1200.0
5 106 Keyboard 0.0 500.0You can refer to the screenshot below to see the output.

This is particularly useful when your DataFrame contains a mix of numeric and categorical data, and you only want to fill numeric fields with zeros.
Method 3: Fill NaN Values During the Merge Operation
You can also handle NaN values directly during the merge operation using a Python lambda function:
# First perform the merge
merged_df = pd.merge(store_a, store_b,
on=['product_id', 'product_name'],
how='outer',
suffixes=('_A', '_B'))
# Then apply fillna to numeric columns only
numeric_columns = merged_df.select_dtypes(include=[np.number]).columns
merged_df[numeric_columns] = merged_df[numeric_columns].fillna(0)
print(merged_df)Output:
product_id product_name sales_A sales_B
0 101 Laptop 5000.0 0.0
1 102 Smartphone 3000.0 2500.0
2 103 Tablet 2000.0 1800.0
3 104 Headphones 1000.0 0.0
4 105 Monitor 0.0 1200.0
5 106 Keyboard 0.0 500.0You can refer to the screenshot below to see the output.

This approach is more efficient for large datasets as it targets only the numeric columns.
Check out Convert DataFrame To NumPy Array Without Index in Python
Method 4: Use replace() Method
Another approach is to use Python replace() method:
# Replace NaN values with 0
merged_sales_replaced = merged_sales.replace(np.nan, 0)
print(merged_sales_replaced)While similar to fillna(), this method can be useful if you’re already using replace() for other transformations.
Method 5: Handle NaN Values in GroupBy Operations After Merge
Sometimes you need to perform groupby operations after merging DataFrames. Here’s how to handle NaN values in that scenario:
# Create a more complex example with customer data
store_a['customer_region'] = ['East', 'West', 'North', 'South']
store_b['customer_region'] = ['West', 'North', 'East', 'West']
# Merge the DataFrames
merged_regional = pd.merge(store_a, store_b,
on=['product_id', 'product_name'],
how='outer',
suffixes=('_A', '_B'))
# Group by region and sum sales, filling NaN with 0 before grouping
merged_regional[['sales_A', 'sales_B']] = merged_regional[['sales_A', 'sales_B']].fillna(0)
regional_sales = merged_regional.groupby('customer_region_A')[['sales_A', 'sales_B']].sum()
print(regional_sales)This ensures your aggregations don’t exclude data points with missing values in some columns.
Performance Considerations When Filling NaN Values
When working with large datasets, performance matters. Here are some tips:
- Use
inplace=Truewithfillna()to avoid creating a copy of your DataFrame:
merged_sales[columns_to_fill].fillna(0, inplace=True)- For very large DataFrames, consider filling NaN values only in the columns you’ll use for calculations:
# Only fill columns needed for analysis
calculation_cols = ['sales_A', 'sales_B', 'profit_A', 'profit_B']
merged_sales[calculation_cols] = merged_sales[calculation_cols].fillna(0)- Using
np.nanis more memory-efficient than Python’sNone:
# Better for performance
missing_values = np.nan
# Less efficient
missing_values = NoneReal-World Application: Analyze Sales Data
In this example, I will show how filling NaN values with zeros allows us to calculate meaningful metrics like growth rates across periods.
# Create more detailed sales data
q1_sales = pd.DataFrame({
'product_id': [101, 102, 103, 104],
'region': ['Northeast', 'West', 'South', 'Midwest'],
'Q1_revenue': [12500, 8700, 9200, 7600],
'Q1_units': [45, 32, 28, 19]
})
q2_sales = pd.DataFrame({
'product_id': [102, 103, 105, 106],
'region': ['West', 'South', 'Northeast', 'West'],
'Q2_revenue': [9800, 8900, 11200, 6500],
'Q2_units': [36, 26, 40, 22]
})
# Merge quarterly data
quarterly_comparison = pd.merge(q1_sales, q2_sales,
on='product_id',
how='outer',
suffixes=('_Q1', '_Q2'))
# Fill NaN values with 0 for numeric columns
numeric_cols = quarterly_comparison.select_dtypes(include=[np.number]).columns
quarterly_comparison[numeric_cols] = quarterly_comparison[numeric_cols].fillna(0)
# Calculate growth
quarterly_comparison['revenue_growth'] = quarterly_comparison['Q2_revenue'] - quarterly_comparison['Q1_revenue']
quarterly_comparison['units_growth'] = quarterly_comparison['Q2_units'] - quarterly_comparison['Q1_units']
print(quarterly_comparison[['product_id', 'Q1_revenue', 'Q2_revenue', 'revenue_growth']])Common Mistakes to Avoid When Handling NaN Values
Let me explain to you some common mistakes that you need to avoid when handling NaN values.
- Not distinguishing between 0 and NaN: Zero means “no value” while NaN means “unknown value” – they have different meanings in your data.
- Filling all columns indiscriminately: Some columns (like IDs or categorical data) might need different handling than numeric columns.
- Not checking data types: Attempting to fill NaN values in non-numeric columns with 0 can cause unintended type conversions.
- Ignoring the business context: Sometimes, NaN should be filled with the mean, median, or previous value instead of zero, depending on what makes sense for your analysis.
I hope you found this article helpful for handling NaN values when merging pandas DataFrames. The right approach depends on your specific dataset and analysis needs.
The methods that I explain dm this tutorial are using fillna(), replace() method, replacing NaN values in specific columns, filling NaN values during the merge operation, and handling NaN values in groupby operations after the merge.
Other Python articles you may also like:
- Convert Pandas Dataframe to Tensor Dataset
- Python Dataframe Update Column Value
- Read a CSV to the dictionary using Pandas in Python

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.