Python Split Regex

I have spent over a decade wrangling messy data in Python. One thing I’ve learned is that the standard split() method is great, but it often falls short.

When you are dealing with real-world strings that have multiple different delimiters, you need something more powerful. That is where the Python split regex (Regular Expressions) comes in handy.

I use the re.split() function almost daily to handle complex formatting. It allows me to define a pattern once and let Python do the heavy lifting of breaking the string apart.

In this tutorial, I will show you exactly how to use the Python split regex with practical examples. I have personally used these methods to parse everything from CSV exports to messy log files.

The Problem with the Standard split() Method

If you have a simple sentence, the built-in split() works fine. But imagine you have a list of US cities and zip codes separated by commas, semicolons, and spaces.

The standard method can only handle one delimiter at a time. If you try to use it on a string with mixed punctuation, you end up writing multiple lines of code to replace characters before you can even start splitting.

Using regex simplifies this into a single line. It is more efficient and much easier to maintain as your project grows.

Method 1: Split with Multiple Delimiters

This is the most common reason I reach for the re module. Suppose you are processing a contact list from a legacy database in the US.

The data might look like this: “New York,NY;10001 212-555-0198”. Notice how we have a comma, a semicolon, and a space all acting as separators.

Here is the code I use to handle this:

import re

# A typical messy data string from a US-based contact list
contact_info = "New York,NY;10001 212-555-0198"

# We use the square brackets [] in regex to define a set of delimiters
# Here, we split by comma, semicolon, or space
result = re.split(r'[;, ]', contact_info)

print(result)
# Output: ['New', 'York', 'NY', '10001', '212-555-0198']

I executed the above example code and added the screenshot below.

python regex split

By putting the characters inside [], I am telling Python to split the string whenever it sees any one of those characters. It’s a clean, one-line solution.

Method 2: Handle Multiple Occurrences of Delimiters

Sometimes your data is even messier. You might have double spaces or multiple commas where there should only be one.

If you use the previous method, you will end up with empty strings in your list. To fix this, I add a + sign to my regex pattern.

The + stands for “one or more.” It treats consecutive delimiters as a single split point.

import re

# Data with accidental double delimiters
raw_data = "Los Angeles,,CA  90001;;;310-555-0123"

# Adding the '+' ensures we don't get empty strings in our list
clean_list = re.split(r'[ ,;]+', raw_data)

print(clean_list)
# Output: ['Los', 'Angeles', 'CA', '90001', '310-555-0123']

I executed the above example code and added the screenshot below.

python split regex

I find this incredibly useful when dealing with user-generated text where people might accidentally hit the spacebar twice.

Method 3: Limit the Number of Splits

There are times when I only want to split the first part of a string and keep the rest intact. For instance, if I am parsing a log entry from a server in Chicago.

The log might have a timestamp, an error level, and then a long message that contains spaces. I don’t want to split the message itself.

I use the maxsplit argument for this.

import re

# A log entry where we only want to extract the first two parts
log_entry = "2024-04-20 ERROR The system in the Chicago data center failed to respond."

# Split only at the first 2 occurrences of whitespace
parts = re.split(r'\s+', log_entry, maxsplit=2)

print(parts)
# Output: ['2024-04-20', 'ERROR', 'The system in the Chicago data center failed to respond.']

I executed the above example code and added the screenshot below.

python re split

By setting maxsplit=2, Python stops splitting after the second match, leaving the rest of the text as a single string.

Method 4: Keep the Delimiters in the Result

Usually, when you split a string, the delimiters are thrown away. But occasionally, I need to know which delimiter was used to split the text.

To keep the delimiters, I wrap the regex pattern in parentheses (). This creates a “capturing group.”

import re

# A string where the delimiter itself is important
expression = "Revenue:50000+Bonus:5000-Tax:12000"

# Using parentheses to keep the operators (+, -, :)
pieces = re.split(r'([:+/-])', expression)

print(pieces)
# Output: ['Revenue', ':', '50000', '+', 'Bonus', ':', '5000', '-', 'Tax', ':', '12000']

I have used this method many times when building simple parsers for mathematical expressions or custom configuration files.

Summary of Python Split Regex Patterns

PatternDescription
r'[ ,;]'Splits by a single space, comma, or semicolon.
r'[ ,;]+'Splits by one or more spaces, commas, or semicolons (avoids empty results).
r'\s+'Splits by any whitespace character (tabs, newlines, spaces).
r'(\d+)'Splits by digits and keeps the digits in the resulting list.

Python’s re.split() is a versatile tool that has saved me countless hours of manual string cleaning. Whether you are dealing with US addresses, server logs, or complex financial data, mastering these regex patterns will make your code much cleaner.

You may also like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.