Pandas Split Column by Delimiter

I’ve found that raw datasets are rarely ever “clean” right out of the gate. I often come across columns where multiple pieces of information are crammed into a single cell, separated by a comma, a space, or a hyphen.

Whether it’s a full name that needs to be separated into “First” and “Last” or a full address string, knowing how to split these strings efficiently is a lifesaver.

In this tutorial, I’ll show you exactly how I handle splitting columns by delimiters in pandas using practical, real-world examples.

The pandas str.split() Method

The primary tool I use for this task is the str.split() method. It is part of the pandas “str” accessor, which allows you to apply string operations to an entire Series.

The beauty of this method is its simplicity. You tell pandas what the separator is, and it returns a list of strings for every row.

Let’s look at a common scenario: handling American names in a marketing dataset.

Example: Split Full Names into First and Last

Suppose we have a list of customers from New York, but their names are stored as “LastName, FirstName”. We need them in separate columns for a personalized email campaign.

import pandas as pd

# Creating a sample dataset of US-based customers
data = {
    'Customer_ID': [101, 102, 103, 104],
    'Full_Name': ['Smith, John', 'Johnson, Sarah', 'Williams, Michael', 'Brown, Emily'],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

# Splitting the Full_Name column by the comma delimiter
# expand=True is the secret sauce that turns the result into a DataFrame
df[['Last_Name', 'First_Name']] = df['Full_Name'].str.split(',', expand=True)

# Displaying the result
print(df)

You can see the output in the screenshot below.

Pandas Split Column by Delimiter

In the code above, I used expand=True. Without this, pandas would just give you a column containing lists. By expanding, it creates new columns that we can directly map to our DataFrame.

Handle Whitespace After Splitting

One thing I noticed early in my career is that delimiters often come with trailing spaces. In the example above, “Smith, John” split at the comma leaves a leading space in ” John”.

This can break your data merges or filters later on. I always recommend chaining the str.strip() method to clean this up.

# Splitting and stripping whitespace in one go
df[['Last_Name', 'First_Name']] = df['Full_Name'].str.split(',', expand=True)

# Clean up the First_Name column
df['First_Name'] = df['First_Name'].str.strip()

print(df['First_Name'].tolist())
# Output: ['John', 'Sarah', 'Michael', 'Emily']

You can see the output in the screenshot below.

Split Pandas Column by Delimiter

Split Columns with Multiple Delimiters

Sometimes data isn’t as consistent as we’d like. You might encounter strings that use different characters to separate data.

In these cases, I use Regular Expressions (Regex) within the split method.

Imagine you have a column containing product codes and batch numbers separated by either a hyphen or a pipe character.

Example: Split US Product Codes

import pandas as pd

data = {
    'Product_Info': ['PROD-99|TX', 'PROD-102|CA', 'PROD-45|NY', 'PROD-88|FL']
}

df = pd.DataFrame(data)

# Using regex [\|-] to split by either a pipe (|) or a hyphen (-)
# Note: the pipe needs a backslash because it's a special regex character
df[['Type', 'ID', 'State']] = df['Product_Info'].str.split(r'[-|]', expand=True)

print(df)

You can see the output in the screenshot below.

Split Column by Delimiter in Pandas

By using r'[-|]’, I’m telling pandas to treat both the dash and the pipe as valid delimiters. This is incredibly powerful for messy logistical data.

Split into a Specific Number of Columns

There are times when a string contains many delimiters, but you only want to split off the first part.

For instance, if you have a US address string: “123 Main St, Seattle, WA, 98101”. If you only need to isolate the street address from the rest, you can use the n parameter.

The n parameter defines the maximum number of splits to perform.

data = {
    'Location': ['123 Main St, Seattle, WA', '456 Oak Ave, Austin, TX', '789 Pine Rd, Miami, FL']
}

df = pd.DataFrame(data)

# Split only at the first comma
df[['Street', 'City_State']] = df['Location'].str.split(',', n=1, expand=True)

print(df)

You can see the output in the screenshot below.

Split Pandas Column by Delimiter using Python

In this case, n=1 ensures that only the first comma is used as a break point, leaving the City and State together in the second column.

Split and Expanding with Unknown Column Counts

What if you have a column where different rows have a different number of items? This happens often with “Tags” or “Category” columns.

If you use expand=True, pandas will automatically create enough columns to fit the longest list and fill the rest with None.

Example: Split Hobbies or Interests

data = {
    'User': ['Alice', 'Bob', 'Charlie'],
    'Interests': ['Hiking, Photography', 'Coding, Gaming, Cycling', 'Running']
}

df = pd.DataFrame(data)

# Split by comma
interests_df = df['Interests'].str.split(', ', expand=True)

# Rename columns dynamically
interests_df.columns = [f'Interest_{i+1}' for i in range(interests_df.shape[1])]

# Join back to original dataframe
df = pd.concat([df, interests_df], axis=1)

print(df)

This dynamic approach is how I handle surveys or user profile data where inputs are variable.

Deal with the “n” Parameter for Right-Side Splitting

Occasionally, you need to split starting from the right side of the string. While str.split() starts from the left, pandas offers str.rsplit() for right-side operations.

This is perfect if you have a string like “San Francisco-California-USA” and you only want to pull out the country from the end.

df = pd.DataFrame({'Location': ['San Francisco-California-USA', 'Austin-Texas-USA']})

# Split from the right, only once
df[['City_State', 'Country']] = df['Location'].str.rsplit('-', n=1, expand=True)

print(df)

Create a New Row for Each Split Value (Explode)

In more advanced data analysis, you might not want new columns. Instead, you might want a new row for every item in the split list.

I use the explode() method for this. It’s one of the most useful functions added to pandas in recent years.

Example: US Sales Territories

data = {
    'Manager': ['Michael', 'Dwight'],
    'Regions': ['PA, NY, NJ', 'OH, IN']
}

df = pd.DataFrame(data)

# 1. Split the string into a list
df['Regions'] = df['Regions'].str.split(', ')

# 2. Explode the list into separate rows
df_exploded = df.explode('Regions')

print(df_exploded)

This transforms your data into a “long” format, which is usually much better for creating Pivot Tables or performing GroupBy operations.

Performance Tips for Large Datasets

When I’m working with millions of rows of US Census data or financial transactions, I’ve noticed that str.split() can be a bit slow because it’s a Python-based loop under the hood.

If performance becomes an issue, here are a few things I do:

  1. Check for NaNs: Always ensure your column doesn’t have missing values before splitting, or use fillna(”) to avoid errors.
  2. Vectorization: Stick to pandas’ built-in methods rather than using .apply(lambda x: x.split()). The built-in str accessor is optimized.
  3. Data Types: If you have many repeated strings, converting the column to a ‘category’ type before processing can sometimes save memory, though you’ll need to convert back to a string to split.

In this post, I have shown you several ways to split a column by a delimiter in pandas.

I use these methods daily to clean up everything from customer names to complex logistics codes. The str.split() method with expand=True is usually the best starting point for most tasks.

If your data is more complex, don’t forget about rsplit() for right-side splitting or explode() for turning lists into new rows.

You may also like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.