How to Sort by Column in Pandas

Sorting data is one of those fundamental tasks I find myself doing almost every single day when working with Python.

Whether I’m analyzing sales performance across different US states or organizing a list of tech employees by their hire dates, getting the order right is the first step toward finding insights.

In this guide, I will show you exactly how to use the sort_values() method in Pandas to organize your data efficiently.

I have spent years cleaning messy datasets, and I can tell you that mastering these sorting techniques will save you a massive amount of time during the data exploration phase.

This Tutorial Covers:

The Basic Syntax of sort_values()

Before we jump into the examples, it is important to understand the tool we are using. In Pandas, the sort_values() function is the primary way to reorder your rows.

The syntax is straightforward, but it has some powerful arguments that allow you to control exactly how the sorting happens.

df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

In most of my projects, I mainly focus on the by, ascending, and na_position parameters.

Sort a DataFrame by a Single Column

Let’s start with a practical example. Imagine we have a dataset of US tech companies and their current market valuations.

Usually, when I get this data, it’s in no particular order. If I want to see which company has the lowest valuation first, I sort by the valuation column in ascending order.

Here is how I do it:

import pandas as pd

# Creating a dataset of US Tech Companies
data = {
    'Company': ['Microsoft', 'Apple', 'Nvidia', 'Alphabet', 'Amazon'],
    'HQ_City': ['Redmond', 'Cupertino', 'Santa Clara', 'Mountain View', 'Seattle'],
    'Valuation_Trillions': [3.1, 2.9, 2.2, 1.8, 1.9]
}

df = pd.DataFrame(data)

# Sorting by Valuation in ascending order
sorted_df = df.sort_values(by='Valuation_Trillions')

print(sorted_df)

You can refer to the screenshot below to see the output.

In this case, Pandas defaults to ascending order. If you want the biggest companies at the top, you simply set ascending=False.

Sort by Multiple Columns

In real-world scenarios, sorting by one column isn’t always enough. I often encounter situations where I need to “break ties.”

For instance, if I am looking at a list of US real estate listings, I might want to sort by ‘State’ first, and then by ‘Price’ within that state.

Here is the code I use for multi-column sorting:

import pandas as pd

# Real Estate Listings Data
listings = {
    'Listing_ID': [101, 102, 103, 104, 105],
    'State': ['Texas', 'California', 'Texas', 'New York', 'California'],
    'Price_USD': [450000, 850000, 320000, 950000, 720000],
    'Beds': [3, 4, 3, 2, 3]
}

df_listings = pd.DataFrame(listings)

# Sort by State (Alphabetical) and then Price (Descending)
df_sorted = df_listings.sort_values(by=['State', 'Price_USD'], ascending=[True, False])

print(df_sorted)

You can refer to the screenshot below to see the output.

Notice how I passed a list to both by and ascending. This gives me granular control over each column’s direction.

Handle Missing Values During Sorting

One thing that used to frustrate me early in my career was how NaN (null) values would mess up my reports.

By default, Pandas puts all missing values at the end of the list. However, depending on the US tax data or census reports I am working with, I might need them at the top.

You can control this using the na_position argument.

import pandas as pd
import numpy as np

# US Census Population Growth Data with some missing values
census_data = {
    'City': ['Austin', 'Phoenix', 'Denver', 'Seattle', 'Miami'],
    'Growth_Rate': [0.25, np.nan, 0.15, 0.12, np.nan]
}

df_census = pd.DataFrame(census_data)

# Sorting and putting missing values at the beginning
df_na_first = df_census.sort_values(by='Growth_Rate', na_position='first')

print(df_na_first)

You can refer to the screenshot below to see the output.

Setting na_position=’first’ is incredibly helpful when you are trying to identify which data points need cleaning or follow-up.

Sort by Index Instead of Values

Sometimes, I don’t want to sort by the content of the columns. I might have already performed an operation that shuffled my index, and I want to get back to the original order.

In these cases, I use sort_index(). This is particularly useful for time-series data, like US Stock Market hourly updates.

# Assuming our index is currently shuffled
df_shuffled = df.sample(frac=1) 

# Reverting to original order using the index
df_restored = df_shuffled.sort_index()

Use a Custom Sorting Logic

There are times when alphabetical or numerical sorting just doesn’t cut it. For example, if I am sorting US clothing sizes (Small, Medium, Large, XL), standard sorting would put “Large” before “Medium.”

To fix this, I use the key parameter. This allows me to apply a function to the values before sorting.

# Custom sorting for Priority levels in a US Logistics dataset
priority_map = {'Low': 1, 'Medium': 2, 'High': 3, 'Critical': 4}

df_logistics = pd.DataFrame({
    'Shipment_ID': ['A1', 'B2', 'C3', 'D4'],
    'Priority': ['Medium', 'Critical', 'Low', 'High']
})

# Sorting using a lambda function as a key
df_logistics_sorted = df_logistics.sort_values(
    by='Priority', 
    key=lambda x: x.map(priority_map)
)

print(df_logistics_sorted)

This is a more advanced technique, but it is a lifesaver when dealing with categorical data that has a logical sequence.

Sort In-Place vs. Creating a New Object

When I am working with massive datasets, say, millions of rows of US healthcare records, I have to be careful about memory usage.

By default, sort_values() returns a new DataFrame. If you want to modify your existing DataFrame without creating a copy, use inplace=True.

# Sorting the existing dataframe without creating a new variable
df.sort_values(by='Company', inplace=True)

Just be careful: once you use inplace=True, the original order is gone unless you have a way to revert it (like a saved index).

How to Sort by Date Columns

If you are analyzing US economic trends over time, you will definitely be sorting by dates. One common mistake I see is trying to sort date strings.

Always ensure your column is a datetime object first.

import pandas as pd

# US Federal Holidays or Event Data
events = {
    'Event': ['Labor Day', 'July 4th', 'New Years', 'Christmas'],
    'Date': ['2024-09-02', '2024-07-04', '2024-01-01', '2024-12-25']
}

df_events = pd.DataFrame(events)

# Convert string to datetime
df_events['Date'] = pd.to_datetime(df_events['Date'])

# Sort by Date
df_events_sorted = df_events.sort_values(by='Date')

print(df_events_sorted)

You can refer to the screenshot below to see the output.

Case Sensitivity in String Sorting

By default, Pandas sorts strings using ASCII values, which means “Zebra” might come before “apple” because uppercase letters are ranked higher.

When I want a truly alphabetical list of US cities, I usually normalize the case using the key parameter.

# Sorting strings case-insensitively
df_cities = pd.DataFrame({'City': ['atlanta', 'Boston', 'chicago', 'Denver']})

df_sorted_cities = df_cities.sort_values(by='City', key=lambda col: col.str.lower())

Practical Use Case: Analyze US Salary Data

To bring it all together, let’s look at a comprehensive example. Suppose we have a list of employees in a New York-based firm.

We want to find the highest-paid employees in each department, but we also want the departments listed alphabetically.

import pandas as pd

# Employee dataset
data = {
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'Department': ['IT', 'HR', 'IT', 'Sales', 'HR', 'Sales'],
    'Salary_USD': [120000, 85000, 115000, 95000, 90000, 105000],
    'Years_Exp': [5, 3, 4, 6, 2, 7]
}

df_employees = pd.DataFrame(data)

# 1. Sort by Department (Ascending)
# 2. Sort by Salary (Descending) to see top earners first
df_final = df_employees.sort_values(
    by=['Department', 'Salary_USD'], 
    ascending=[True, False]
)

# Resetting index to make the report look clean
df_final = df_final.reset_index(drop=True)

print("Top Earners by Department:")
print(df_final)

This snippet demonstrates the power of combining these techniques to create a professional-grade report.

Sorting in Pandas is more than just a convenience; it’s a critical part of the data storytelling process. Once you get comfortable with sort_values, you’ll find that your data analysis becomes much more intuitive.

I hope you found this tutorial helpful! Whether you are just starting with Python or you are a seasoned developer looking for a quick reference, these sorting methods should cover almost every situation you encounter.

You may read:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

How to Sort by Column in Pandas

The Basic Syntax of sort_values()

Sort a DataFrame by a Single Column

Sort by Multiple Columns

Handle Missing Values During Sorting

Sort by Index Instead of Values

Use a Custom Sorting Logic

Sort In-Place vs. Creating a New Object

How to Sort by Date Columns

Case Sensitivity in String Sorting

Practical Use Case: Analyze US Salary Data

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends