How to Fill NaN Values with Zeros in Pandas DataFrames

When I was working on a data analysis project for a US retail chain, I needed to merge sales data from different stores. The challenge was dealing with missing values (NaN) that appeared after the merge operation. These NaN values were causing calculation errors in my analysis.

In this article, I’ll share several practical ways to handle NaN values when merging DataFrames in pandas, with a focus on replacing them with zeros.

Let us see the solutions!

NaN Values in Pandas Merges

When you merge two DataFrames in Python pandas, you’ll often encounter NaN values in the result. These represent missing data that exists in one DataFrame but not in the other.

Let’s see this with a quick example using sales data:

import pandas as pd
import numpy as np

# Store A sales data
store_a = pd.DataFrame({
    'product_id': [101, 102, 103, 104],
    'product_name': ['Laptop', 'Smartphone', 'Tablet', 'Headphones'],
    'sales': [5000, 3000, 2000, 1000]
})

# Store B sales data
store_b = pd.DataFrame({
    'product_id': [102, 103, 105, 106],
    'product_name': ['Smartphone', 'Tablet', 'Monitor', 'Keyboard'],
    'sales': [2500, 1800, 1200, 500]
})

# Merge the DataFrames
merged_sales = pd.merge(store_a, store_b, 
                        on=['product_id', 'product_name'], 
                        how='outer',
                        suffixes=('_A', '_B'))

print(merged_sales)

In the resulting DataFrame, you’ll notice NaN values where products were sold in one store but not the other.

Fill NaN Values with Zeros in Pandas DataFrames

Now, I will explain how to fill NaN values with zeros in pandas DataFrames.

Read Fix “Function Not Implemented for This Dtype” Error in Python

Method 1: Use fillna() to Replace NaN with Zeros

The simplest and most direct way to replace NaN values with zeros is to use the fillna() method in Python:

# Replace all NaN values with 0
merged_sales_filled = merged_sales.fillna(0)

print(merged_sales_filled)

Output:

   product_id product_name  sales_A  sales_B
0         101       Laptop   5000.0      0.0
1         102   Smartphone   3000.0   2500.0
2         103       Tablet   2000.0   1800.0
3         104   Headphones   1000.0      0.0
4         105      Monitor      0.0   1200.0
5         106     Keyboard      0.0    500.0

You can refer to the screenshot below to see the output.

replace nan with 0 pandas

This approach is straightforward but replaces all NaN values in the DataFrame. Sometimes, you might want to be more selective.

Method 2: Replace NaN Values in Specific Columns

If you only want to replace NaN values in certain columns (like sales figures), you can specify them:

# Replace NaN values only in the sales columns
columns_to_fill = ['sales_A', 'sales_B']
merged_sales[columns_to_fill] = merged_sales[columns_to_fill].fillna(0)

print(merged_sales)

Output:

   product_id product_name  sales_A  sales_B
0         101       Laptop   5000.0      0.0
1         102   Smartphone   3000.0   2500.0
2         103       Tablet   2000.0   1800.0
3         104   Headphones   1000.0      0.0
4         105      Monitor      0.0   1200.0
5         106     Keyboard      0.0    500.0

You can refer to the screenshot below to see the output.

pandas replace nan with 0

This is particularly useful when your DataFrame contains a mix of numeric and categorical data, and you only want to fill numeric fields with zeros.

Method 3: Fill NaN Values During the Merge Operation

You can also handle NaN values directly during the merge operation using a Python lambda function:

# First perform the merge
merged_df = pd.merge(store_a, store_b, 
                     on=['product_id', 'product_name'], 
                     how='outer',
                     suffixes=('_A', '_B'))

# Then apply fillna to numeric columns only
numeric_columns = merged_df.select_dtypes(include=[np.number]).columns
merged_df[numeric_columns] = merged_df[numeric_columns].fillna(0)

print(merged_df)

Output:

   product_id product_name  sales_A  sales_B
0         101       Laptop   5000.0      0.0
1         102   Smartphone   3000.0   2500.0
2         103       Tablet   2000.0   1800.0
3         104   Headphones   1000.0      0.0
4         105      Monitor      0.0   1200.0
5         106     Keyboard      0.0    500.0

You can refer to the screenshot below to see the output.

how to replace nan with 0 in pandas

This approach is more efficient for large datasets as it targets only the numeric columns.

Check out Convert DataFrame To NumPy Array Without Index in Python

Method 4: Use replace() Method

Another approach is to use Python replace() method:

# Replace NaN values with 0
merged_sales_replaced = merged_sales.replace(np.nan, 0)

print(merged_sales_replaced)

While similar to fillna(), this method can be useful if you’re already using replace() for other transformations.

Method 5: Handle NaN Values in GroupBy Operations After Merge

Sometimes you need to perform groupby operations after merging DataFrames. Here’s how to handle NaN values in that scenario:

# Create a more complex example with customer data
store_a['customer_region'] = ['East', 'West', 'North', 'South']
store_b['customer_region'] = ['West', 'North', 'East', 'West']

# Merge the DataFrames
merged_regional = pd.merge(store_a, store_b, 
                           on=['product_id', 'product_name'], 
                           how='outer',
                           suffixes=('_A', '_B'))

# Group by region and sum sales, filling NaN with 0 before grouping
merged_regional[['sales_A', 'sales_B']] = merged_regional[['sales_A', 'sales_B']].fillna(0)
regional_sales = merged_regional.groupby('customer_region_A')[['sales_A', 'sales_B']].sum()

print(regional_sales)

This ensures your aggregations don’t exclude data points with missing values in some columns.

Performance Considerations When Filling NaN Values

When working with large datasets, performance matters. Here are some tips:

  1. Use inplace=True with fillna() to avoid creating a copy of your DataFrame:
   merged_sales[columns_to_fill].fillna(0, inplace=True)
  1. For very large DataFrames, consider filling NaN values only in the columns you’ll use for calculations:
   # Only fill columns needed for analysis
   calculation_cols = ['sales_A', 'sales_B', 'profit_A', 'profit_B']
   merged_sales[calculation_cols] = merged_sales[calculation_cols].fillna(0)
  1. Using np.nan is more memory-efficient than Python’s None:
   # Better for performance
   missing_values = np.nan
   # Less efficient
   missing_values = None

Real-World Application: Analyze Sales Data

In this example, I will show how filling NaN values with zeros allows us to calculate meaningful metrics like growth rates across periods.

# Create more detailed sales data
q1_sales = pd.DataFrame({
    'product_id': [101, 102, 103, 104],
    'region': ['Northeast', 'West', 'South', 'Midwest'],
    'Q1_revenue': [12500, 8700, 9200, 7600],
    'Q1_units': [45, 32, 28, 19]
})

q2_sales = pd.DataFrame({
    'product_id': [102, 103, 105, 106],
    'region': ['West', 'South', 'Northeast', 'West'],
    'Q2_revenue': [9800, 8900, 11200, 6500],
    'Q2_units': [36, 26, 40, 22]
})

# Merge quarterly data
quarterly_comparison = pd.merge(q1_sales, q2_sales, 
                               on='product_id', 
                               how='outer',
                               suffixes=('_Q1', '_Q2'))

# Fill NaN values with 0 for numeric columns
numeric_cols = quarterly_comparison.select_dtypes(include=[np.number]).columns
quarterly_comparison[numeric_cols] = quarterly_comparison[numeric_cols].fillna(0)

# Calculate growth
quarterly_comparison['revenue_growth'] = quarterly_comparison['Q2_revenue'] - quarterly_comparison['Q1_revenue']
quarterly_comparison['units_growth'] = quarterly_comparison['Q2_units'] - quarterly_comparison['Q1_units']

print(quarterly_comparison[['product_id', 'Q1_revenue', 'Q2_revenue', 'revenue_growth']])

Common Mistakes to Avoid When Handling NaN Values

Let me explain to you some common mistakes that you need to avoid when handling NaN values.

  1. Not distinguishing between 0 and NaN: Zero means “no value” while NaN means “unknown value” – they have different meanings in your data.
  2. Filling all columns indiscriminately: Some columns (like IDs or categorical data) might need different handling than numeric columns.
  3. Not checking data types: Attempting to fill NaN values in non-numeric columns with 0 can cause unintended type conversions.
  4. Ignoring the business context: Sometimes, NaN should be filled with the mean, median, or previous value instead of zero, depending on what makes sense for your analysis.

I hope you found this article helpful for handling NaN values when merging pandas DataFrames. The right approach depends on your specific dataset and analysis needs.

The methods that I explain dm this tutorial are using fillna(), replace() method, replacing NaN values in specific columns, filling NaN values during the merge operation, and handling NaN values in groupby operations after the merge.

Other Python articles you may also like:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.