How to Create a Scatter Plot in Pandas

As a Python developer who has spent years wrangling data, I’ve found that nothing reveals the relationship between two variables faster than a scatter plot.

Whether I am analyzing housing prices in California or tracking tech stock trends on the NASDAQ, a scatter plot is my go-to tool for spotting outliers.

In this tutorial, I will show you exactly how to generate scatter plots directly from your Pandas DataFrames using various methods I use in my daily workflow.

The Basics of Pandas Scatter Plots

Before we get into the advanced styling, let’s look at the simplest way to get a plot on your screen.

Pandas has a built-in .plot() method that wraps around Matplotlib, making it incredibly convenient for quick data exploration.

For our examples, let’s use a dataset representing different cities in the USA, looking at their population density and average rent prices.

import pandas as pd
import matplotlib.pyplot as plt

# Sample data: US City Metrics
data = {
    'City': ['New York', 'San Francisco', 'Austin', 'Seattle', 'Miami', 'Chicago', 'Denver', 'Boston'],
    'Population_Density': [27012, 18633, 3006, 8791, 12239, 11847, 4674, 13976],
    'Avg_Rent': [3800, 3500, 1800, 2200, 2500, 2100, 1900, 3100]
}

df = pd.DataFrame(data)

# Method 1: The standard kind='scatter' approach
df.plot(kind='scatter', x='Population_Density', y='Avg_Rent', color='blue', title='US City Density vs Rent')

plt.show()

In my experience, using kind=’scatter’ is the most readable way to write your code for long-term maintenance.

Method 1: Use the DataFrame.plot.scatter() Wrapper

While kind=’scatter’ works great, Pandas also provides a direct accessor called .plot.scatter().

I personally prefer this syntax because it feels more “Pythonic” and allows for better IDE autocompletion.

In this example, let’s look at a US-specific scenario: The relationship between years of experience and annual salary in the American tech industry.

import pandas as pd
import matplotlib.pyplot as plt

# Tech Salary Data in the USA
salary_data = {
    'Years_Experience': [1, 2, 3, 5, 8, 10, 12, 15, 20],
    'Annual_Salary_USD': [75000, 82000, 95000, 120000, 155000, 175000, 210000, 240000, 300000]
}

df_salary = pd.DataFrame(salary_data)

# Creating the plot using the direct scatter method
df_salary.plot.scatter(x='Years_Experience', 
                       y='Annual_Salary_USD', 
                       grid=True, 
                       figsize=(10, 6),
                       title='Tech Salary Growth in the USA')

plt.ylabel('Salary (USD)')
plt.xlabel('Years of Experience')
plt.show()

I executed the above example code and added the screenshot below.

Create a Scatter Plot in Pandas

Adding grid=True is a small tip I always recommend. It makes it much easier for your stakeholders to pinpoint specific data points on the Y-axis.

Method 2: Add Color Maps (c) and Point Size (s)

When I am dealing with multi-dimensional data, a simple 2D plot isn’t enough. I often use color and size to represent a third and fourth variable.

Suppose we are looking at US gas stations. We want to see the relationship between the number of pumps (X) and daily customers (Y), and we also want to color-code them by gas price.

# US Gas Station Performance
station_data = {
    'Pumps': [4, 8, 12, 16, 6, 10, 14, 20],
    'Daily_Customers': [200, 450, 700, 1100, 310, 600, 850, 1500],
    'Gas_Price': [3.10, 3.45, 3.80, 4.10, 3.20, 3.60, 3.95, 4.50],
    'Store_Size_SQFT': [500, 1500, 2500, 4000, 800, 2000, 3000, 5000]
}

df_stations = pd.DataFrame(station_data)

# Using 'c' for color and 's' for size
df_stations.plot.scatter(x='Pumps', 
                         y='Daily_Customers', 
                         c='Gas_Price', 
                         s=df_stations['Store_Size_SQFT'] * 0.1, 
                         colormap='viridis', 
                         alpha=0.7,
                         title='US Gas Station Analytics')

plt.show()

I executed the above example code and added the screenshot below.

How to Create a Scatter Plot in Pandas

In this code, the s parameter (size) is multiplied by 0.1. I do this because the raw square footage values are too large for the plot markers.

Method 3: Group and Plotting Multiple Series

Often, I need to compare two groups in the same chart, such as “New York” vs. “Texas”.

The trick here is to create an ax (axis) object and pass it into the subsequent plot calls. This overlays the data.

# Comparing Two States: High School Scores
ny_scores = {'Math': [88, 92, 95, 78], 'Science': [90, 85, 99, 82]}
tx_scores = {'Math': [75, 80, 85, 90], 'Science': [70, 78, 88, 92]}

df_ny = pd.DataFrame(ny_scores)
df_tx = pd.DataFrame(tx_scores)

# Overlaying two scatter plots
ax = df_ny.plot.scatter(x='Math', y='Science', color='Blue', label='New York')
df_tx.plot.scatter(x='Math', y='Science', color='Red', label='Texas', ax=ax)

plt.title('Comparison of Student Scores by State')
plt.show()

I executed the above example code and added the screenshot below.

Create Scatter Plot in Pandas

I find this method invaluable when I am presenting competitive analysis or A/B testing results to my team.

Handle Overlapping Data with Alpha and Jitter

One common headache I face is “overplotting,” where many dots land on the exact same coordinate. This hides the true density of the data.

To solve this, I use the alpha parameter to make dots semi-transparent.

If I have 1,000 retail transactions at a US Walmart location that all share the same price point, alpha=0.5 will show darker clusters where more shoppers are buying.

# Large dataset simulation
import numpy as np

# Simulating 500 shoppers in a US store
np.random.seed(42)
df_shoppers = pd.DataFrame({
    'Items_Bought': np.random.randint(1, 15, 500),
    'Total_Spent': np.random.randint(10, 200, 500)
})

df_shoppers.plot.scatter(x='Items_Bought', y='Total_Spent', alpha=0.3, color='forestgreen')
plt.title('US Retail Shopping Patterns (Alpha Transparency)')
plt.show()

Customize Your Scatter Plot for Professional Reports

If you are publishing your findings on a blog or a company report, the default Matplotlib styles can look a bit dated.

I always take a few extra seconds to clean up the labels and add a professional color palette.

Here is a full example that creates a “Publication-Ready” scatter plot using US Census-style data on Education vs. Income.

import pandas as pd
import matplotlib.pyplot as plt

# US Education and Income Data
data = {
    'Education_Years': [12, 14, 16, 18, 20, 12, 16, 18, 14, 16],
    'Annual_Income_K': [45, 55, 85, 120, 150, 42, 90, 115, 60, 88],
    'State_Tax_Rate': [0.05, 0.06, 0.04, 0.07, 0.05, 0.03, 0.08, 0.06, 0.04, 0.05]
}

df_edu = pd.DataFrame(data)

# Advanced customization
ax = df_edu.plot.scatter(
    x='Education_Years', 
    y='Annual_Income_K', 
    c='State_Tax_Rate', 
    colormap='RdYlGn', 
    s=100, 
    edgecolor='black', 
    linewidth=1,
    sharex=False # Keeps the X-axis label visible when using colorbars
)

ax.set_title('Impact of Education on Income in the USA', fontsize=14, pad=20)
ax.set_xlabel('Years of Education', fontsize=12)
ax.set_ylabel('Annual Income (in $1,000s)', fontsize=12)

plt.tight_layout()
plt.show()

Using edgecolor=’black’ makes the points pop, especially when you have a lot of different colors in the background.

Why use Pandas over Matplotlib for Scatter Plots?

You might wonder why I don’t just use plt.scatter() every time.

The reason is speed. When your data is already in a DataFrame, Pandas automatically handles the labels and legends for you.

If I use Matplotlib directly, I have to manually extract each column and write extra lines of code for the legend. Pandas does the “heavy lifting” so I can focus on the analysis.

Common Errors to Avoid

Throughout my years of coding, I’ve seen beginners trip up on the same few things. Here is how to avoid them:

  1. Missing ‘x’ or ‘y’: Unlike some other plots, a scatter plot requires both X and Y columns. If you miss one, Pandas will throw a ValueError.
  2. Non-Numeric Data: You cannot plot strings in a scatter plot. I always ensure my columns are float or int before plotting. Use df[‘col’] = pd.to_numeric(df[‘col’]) if you get an error.
  3. The Colorbar Fix: Sometimes when you add a colorbar, the X-axis label disappears. I always add sharex=False inside the plot function to fix this bug.

Summary of Scatter Plot Methods

MethodBest ForDifficulty
df.plot(kind='scatter')Quick data checksEasy
df.plot.scatter()Standard developmentEasy
ax parameter overlayComparing two datasetsIntermediate
c and s argumentsMultidimensional analysisIntermediate

I hope you found this tutorial helpful! Creating scatter plots in Pandas is an essential skill for any data scientist working with Python.

Whether you are looking at US economic trends or personal health data, these techniques will help you visualize your data clearly and effectively.

You may read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.