Convert A DataFrame To A Nested Dictionary In Python

Recently, I was working on a data analytics project where I needed to convert a Pandas DataFrame into a nested dictionary format for a JSON API endpoint. The challenge was that I needed a specific structure with multiple levels of nesting.

Fortunately, Pandas offers several flexible methods to transform DataFrames into nested dictionaries that can accommodate various use cases.

In this article, I’ll share four practical methods to convert a DataFrame to a nested dictionary in Python, with real examples you can implement them right away.

This Tutorial Covers:

Nested Dictionary in Python

A nested dictionary is simply a dictionary in Python that contains other dictionaries as values. It allows you to create hierarchical data structures with multiple levels.

For example, a nested dictionary might look like this:

{
    'California': {
        'population': 39.5,
        'capital': 'Sacramento',
        'largest_city': 'Los Angeles'
    },
    'Texas': {
        'population': 29.1,
        'capital': 'Austin',
        'largest_city': 'Houston'
    }
}

Read Convert a Pandas DataFrame to a Dict Without Index in Python

Convert A DataFrame To A Nested Dictionary In Python

Now, I will explain to you the methods to convert a DataFrame to a nested dictionary in Python.

Method 1: Use to_dict() with orient=’index’

The simplest way to convert a DataFrame to a nested dictionary is to use the Python built-in method to_dict() with the orient='index' parameter.

Let’s say we have a DataFrame containing US sales data:

import pandas as pd

# Sample US sales data
data = {
    'Region': ['Northeast', 'Southeast', 'Midwest', 'West'],
    'Q1_Sales': [250000, 175000, 300000, 225000],
    'Q2_Sales': [310000, 190000, 340000, 280000],
    'Growth': [0.15, 0.05, 0.12, 0.18]
}

df = pd.DataFrame(data)
print(df)

To convert this DataFrame to a nested dictionary with regions as the primary keys:

# Convert DataFrame to nested dictionary with region as the key
nested_dict = df.set_index('Region').to_dict(orient='index')
print(nested_dict)

The output will be:

{
    'Northeast': {'Q1_Sales': 250000, 'Q2_Sales': 310000, 'Growth': 0.15},
    'Southeast': {'Q1_Sales': 175000, 'Q2_Sales': 190000, 'Growth': 0.05},
    'Midwest': {'Q1_Sales': 300000, 'Q2_Sales': 340000, 'Growth': 0.12},
    'West': {'Q1_Sales': 225000, 'Q2_Sales': 280000, 'Growth': 0.18}
}

I executed the above example code and added the screenshot below.

This method is perfect when you want to use one of your DataFrame columns as the primary key for your nested dictionary.

Check out Convert a Pandas DataFrame to a List in Python

Method 2: Create Multi-Level Nested Dictionaries

Sometimes you need a more complex nested structure with multiple levels. Let’s say we have a DataFrame with detailed US sales data by state and product:

# More detailed US sales data
data = {
    'State': ['California', 'California', 'Texas', 'Texas', 'New York', 'New York'],
    'Product': ['Laptops', 'Phones', 'Laptops', 'Phones', 'Laptops', 'Phones'],
    'Sales': [500000, 750000, 300000, 450000, 425000, 575000],
    'Units': [2000, 3000, 1200, 1800, 1700, 2300]
}

df = pd.DataFrame(data)
print(df)

To create a nested dictionary with states as the primary keys and products as secondary keys:

# Create a multi-level nested dictionary
nested_dict = {}

for _, row in df.iterrows():
    state = row['State']
    product = row['Product']

    if state not in nested_dict:
        nested_dict[state] = {}

    nested_dict[state][product] = {
        'Sales': row['Sales'],
        'Units': row['Units']
    }

print(nested_dict)

This will produce:

{
    'California': {
        'Laptops': {'Sales': 500000, 'Units': 2000},
        'Phones': {'Sales': 750000, 'Units': 3000}
    },
    'Texas': {
        'Laptops': {'Sales': 300000, 'Units': 1200},
        'Phones': {'Sales': 450000, 'Units': 1800}
    },
    'New York': {
        'Laptops': {'Sales': 425000, 'Units': 1700},
        'Phones': {'Sales': 575000, 'Units': 2300}
    }
}

I executed the above example code and added the screenshot below.

This method gives you full control over the structure of your nested dictionary.

Read Add Rows to a DataFrame Pandas in a Loop in Python

Method 3: Use groupby() for Nested Dictionaries

Another efficient approach is to use Pandas’ groupby() function to structure your nested dictionary:

# Using groupby to create a nested dictionary
grouped = df.groupby(['State', 'Product']).apply(lambda x: x[['Sales', 'Units']].to_dict('records')[0])
nested_dict = grouped.to_dict()
print(nested_dict)

This will produce a dictionary with tuple keys:

{
    ('California', 'Laptops'): {'Sales': 500000, 'Units': 2000},
    ('California', 'Phones'): {'Sales': 750000, 'Units': 3000},
    ('Texas', 'Laptops'): {'Sales': 300000, 'Units': 1200},
    ('Texas', 'Phones'): {'Sales': 450000, 'Units': 1800},
    ('New York', 'Laptops'): {'Sales': 425000, 'Units': 1700},
    ('New York', 'Phones'): {'Sales': 575000, 'Units': 2300}
}

I executed the above example code and added the screenshot below.

To restructure this into our desired nested format:

# Convert from tuple keys to nested dictionary
restructured_dict = {}
for (state, product), values in nested_dict.items():
    if state not in restructured_dict:
        restructured_dict[state] = {}
    restructured_dict[state][product] = values

print(restructured_dict)

This method leverages Pandas’ powerful groupby capabilities for more complex aggregations.

Check out Convert Python Dictionary to Pandas DataFrame

Method 4: Use to_dict() with orient=’records’ and Further Processing

For more flexibility, we can use to_dict() with orient='records' and then further process the data:

# Using to_dict with orient='records'
records = df.to_dict(orient='records')

# Processing the records into a nested dictionary
nested_dict = {}
for record in records:
    state = record.pop('State')
    product = record.pop('Product')

    if state not in nested_dict:
        nested_dict[state] = {}

    nested_dict[state][product] = record

print(nested_dict)

This produces the same result as our earlier methods but offers more flexibility in how you structure your final dictionary.

Read Pandas str.replace Multiple Values in Python

Custom Nesting for Specific Business Requirements

In real-world applications, you might need very specific nesting structures. Let’s say we want to organize our US retail data by region, then state, then city:

# More complex US retail data
data = {
    'Region': ['West', 'West', 'South', 'South', 'Northeast', 'Northeast'],
    'State': ['California', 'California', 'Texas', 'Texas', 'New York', 'New York'],
    'City': ['Los Angeles', 'San Francisco', 'Houston', 'Dallas', 'NYC', 'Buffalo'],
    'Revenue': [1200000, 950000, 850000, 780000, 1500000, 420000],
    'Customers': [12000, 9500, 8500, 7800, 15000, 4200]
}

df = pd.DataFrame(data)

# Creating a three-level nested dictionary
nested_retail = {}

for _, row in df.iterrows():
    region = row['Region']
    state = row['State']
    city = row['City']

    if region not in nested_retail:
        nested_retail[region] = {}

    if state not in nested_retail[region]:
        nested_retail[region][state] = {}

    nested_retail[region][state][city] = {
        'Revenue': row['Revenue'],
        'Customers': row['Customers']
    }

print(nested_retail)

This creates a three-level nested dictionary that might be perfect for a regional sales dashboard or API.

Check out Pandas Find Duplicates in Python

Handle Complex DataFrames with MultiIndex

When working with complex data structures, you might have a DataFrame with a MultiIndex. Here’s how to handle that:

# Creating a DataFrame with MultiIndex
index = pd.MultiIndex.from_tuples([
    ('2022', 'Q1', 'East'),
    ('2022', 'Q1', 'West'),
    ('2022', 'Q2', 'East'),
    ('2022', 'Q2', 'West'),
    ('2023', 'Q1', 'East'),
    ('2023', 'Q1', 'West')
], names=['Year', 'Quarter', 'Region'])

data = {
    'Sales': [250000, 320000, 280000, 350000, 310000, 380000],
    'Units': [1000, 1280, 1120, 1400, 1240, 1520]
}

multi_df = pd.DataFrame(data, index=index)
print(multi_df)

# Converting to nested dictionary
nested_from_multi = multi_df.to_dict(orient='index')

# Restructuring for better nesting
restructured = {}
for (year, quarter, region), values in nested_from_multi.items():
    if year not in restructured:
        restructured[year] = {}

    if quarter not in restructured[year]:
        restructured[year][quarter] = {}

    restructured[year][quarter][region] = values

print(restructured)

This approach is particularly useful for time series data with multiple dimensions.

I hope you found this article helpful for converting Python Pandas DataFrames to nested dictionaries. Each method has its strengths, and the best choice depends on your specific requirements and the structure of your data.

Remember that nested dictionaries are incredibly versatile for representing hierarchical data and are perfect for tasks like creating JSON responses for APIs, organizing complex datasets, or preparing data for visualization tools.

Other Python articles you may also like:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

Convert A DataFrame To A Nested Dictionary In Python

Nested Dictionary in Python

Convert A DataFrame To A Nested Dictionary In Python

Method 1: Use to_dict() with orient=’index’

Method 2: Create Multi-Level Nested Dictionaries

Method 3: Use groupby() for Nested Dictionaries

Method 4: Use to_dict() with orient=’records’ and Further Processing

Custom Nesting for Specific Business Requirements

Handle Complex DataFrames with MultiIndex

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends