How I Find Duplicate Values in a Python Dictionary

While working on a data-cleaning project for a retail company in the USA, I faced an interesting challenge. I had a Python dictionary containing customer IDs as keys and their email addresses as values.
Some customers accidentally used the same email address for multiple accounts, and I needed to find all duplicate values in that dictionary.

At first, I thought Python might have a built-in function to detect duplicates in a dictionary, but it doesn’t. So, I had to come up with a few simple and efficient ways to handle this.

In this tutorial, I’ll share four easy methods I personally use to find duplicate values in a Python dictionary.

Method 1 – Use a For Loop and a Temporary Set

This is the most straightforward way to find duplicate values in a Python dictionary. I often use this method when working with small datasets or when I want to clearly see what’s happening step by step.

Here’s how it works:

customer_data = {
    "C001": "john.doe@example.com",
    "C002": "mary.smith@example.com",
    "C003": "john.doe@example.com",
    "C004": "lisa.jones@example.com",
    "C005": "mary.smith@example.com"
}

# Create two sets to store unique and duplicate values
unique_values = set()
duplicate_values = set()

# Loop through dictionary values
for email in customer_data.values():
    if email in unique_values:
        duplicate_values.add(email)
    else:
        unique_values.add(email)

print("Duplicate values found:", duplicate_values)

When you run this Python code, you’ll get the following output:

Duplicate values found: {'mary.smith@example.com', 'john.doe@example.com'}

I executed the above example code and added the screenshot below.

does dictionary allow duplicates in python

This method is simple and easy to understand. However, if you’re working with large datasets, there are more efficient ways to handle them.

Method 2 – Use Python Dictionary Comprehension and Count

If you love writing compact, Pythonic code, this approach is for you. Here, I combine list comprehension with the count() method to identify duplicate values quickly.

customer_data = {
    "C001": "john.doe@example.com",
    "C002": "mary.smith@example.com",
    "C003": "john.doe@example.com",
    "C004": "lisa.jones@example.com",
    "C005": "mary.smith@example.com"
}

# Extract all values
values = list(customer_data.values())

# Find duplicates using list comprehension
duplicates = {value for value in values if values.count(value) > 1}

print("Duplicate values found:", duplicates)

I executed the above example code and added the screenshot below.

how to find duplicate values in dictionary python

This method produces the same result as the previous one but in fewer lines of code. It’s perfect when you want quick insights without writing extra loops.

Method 3 – Use Python’s collections.Counter

This is one of my favorite methods because it’s both clean and efficient. The Counter class from Python’s collections module makes it incredibly easy to count occurrences of each value.

Here’s how I use it:

from collections import Counter

customer_data = {
    "C001": "john.doe@example.com",
    "C002": "mary.smith@example.com",
    "C003": "john.doe@example.com",
    "C004": "lisa.jones@example.com",
    "C005": "mary.smith@example.com",
    "C006": "susan.williams@example.com"
}

# Count occurrences of each value
value_counts = Counter(customer_data.values())

# Extract duplicates (values that appear more than once)
duplicates = [email for email, count in value_counts.items() if count > 1]

print("Duplicate values found:", duplicates)

Output:

Duplicate values found: ['john.doe@example.com', 'mary.smith@example.com']

I executed the above example code and added the screenshot below.

duplicate dictionary

This method is highly efficient for large datasets. I often use it when analyzing thousands of records because Counter handles frequency counting internally in optimized C code.

Method 4 – Find Duplicate Keys for Each Value

Sometimes, you don’t just want to find duplicate values, you also want to know which keys share those values.
This method is perfect for that scenario.

Here’s how I handle it in Python:

from collections import defaultdict

customer_data = {
    "C001": "john.doe@example.com",
    "C002": "mary.smith@example.com",
    "C003": "john.doe@example.com",
    "C004": "lisa.jones@example.com",
    "C005": "mary.smith@example.com",
    "C006": "susan.williams@example.com"
}

# Create a dictionary where each value maps to a list of keys
reverse_dict = defaultdict(list)

for key, value in customer_data.items():
    reverse_dict[value].append(key)

# Filter out values that have more than one key
duplicates = {value: keys for value, keys in reverse_dict.items() if len(keys) > 1}

print("Duplicate values and their keys:")
for value, keys in duplicates.items():
    print(f"{value}: {keys}")

Output:

Duplicate values and their keys:
john.doe@example.com: ['C001', 'C003']
mary.smith@example.com: ['C002', 'C005']

This approach not only shows you which values are duplicated but also which keys are associated with them. It’s extremely useful for debugging or cleaning data in real-world applications.

Bonus Tip – Convert Duplicates into a Clean Report

When dealing with real business data, I often need to export duplicate findings into a readable format.
Here’s how you can convert duplicates into a simple report using Python’s pandas library.

import pandas as pd
from collections import defaultdict

customer_data = {
    "C001": "john.doe@example.com",
    "C002": "mary.smith@example.com",
    "C003": "john.doe@example.com",
    "C004": "lisa.jones@example.com",
    "C005": "mary.smith@example.com"
}

reverse_dict = defaultdict(list)

for key, value in customer_data.items():
    reverse_dict[value].append(key)

duplicates = {value: keys for value, keys in reverse_dict.items() if len(keys) > 1}

# Convert to DataFrame
df = pd.DataFrame([(v, k) for v, keys in duplicates.items() for k in keys], columns=["Email", "Customer_ID"])

# Save to CSV
df.to_csv("duplicate_customers.csv", index=False)

print("Duplicate report saved as 'duplicate_customers.csv'")

This creates a neat CSV file containing all duplicate emails and their corresponding customer IDs. It’s a practical solution when sharing reports with non-technical colleagues.

Common Mistakes to Avoid

Even experienced Python developers (including myself) make these mistakes sometimes:

  1. Using set() directly on dictionary values – This removes duplicates but doesn’t tell you which ones were repeated.
  2. Ignoring case sensitivity – “John@example.com” and “john@example.com” are treated as different values. Use .lower() if needed.
  3. Not handling None or empty strings – Always filter out invalid data before checking for duplicates.

By paying attention to these small details, you’ll make your Python scripts more reliable and professional.

Finding duplicate values in a Python dictionary might seem tricky at first, but as you’ve seen, there are multiple ways to handle it efficiently. Whether you prefer the simplicity of loops or the power of the collections module, Python gives you all the tools you need.

Personally, I rely on the Counter-based method for most of my data-cleaning tasks because it’s fast, clean, and scalable. However, if you’re dealing with smaller datasets or just learning Python, the loop-based method is a great place to start.

You may also like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.