How to Read Tab-Delimited Files in Python?

In this tutorial, I will explain how to read tab-delimited files in Python. Tab-delimited files, commonly known as TSV (Tab-Separated Values) files, are a simple text format for storing data in a tabular structure. Recently someone asked me how to read tab-delimited files which made me explore more about this topic and I will share my findings in this post with examples.

Read Tab-Delimited Files in Python

Python provides several ways to read tab-delimited files. Let’s explore some of the most common methods.

Read How to Unzip a File in Python?

1. Use the csv Module

The csv module in Python’s standard library supports reading and writing tab-delimited files. Here’s how you can use it:

import csv

file_path = 'data.tsv'

with open(file_path, newline='') as file:
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        print(row)

Output:

['Name    Age    City']
['John    25    New York']
['Alice   30    Los Angeles']
['Bob     22    Chicago']

You can see the output in the screenshot below.

Read Tab-Delimited Files in Python

In this example, we open the file data.tsv and use the csv.reader with the delimiter set to \t (tab character). Each row is read as a list of strings.

Check out How to Get the File Size in MB using Python?

2. Use the pandas Library

The pandas library in Python provides a more useful and flexible way to handle tab-delimited files. It can read the file into a DataFrame, which is a tabular data structure similar to a spreadsheet.

import pandas as pd

file_path = 'data.tsv'
df = pd.read_csv(file_path, delimiter='\t')

print(df.head())

Output:

         Name    Age    City
0     John    25    New York
1  Alice   30    Los Angeles
2      Bob     22    Chicago

You can see the output in the screenshot below.

import pandas as pd

file_path = r"C:\Users\Public\code\data.tsv"
df = pd.read_csv(file_path, delimiter='\t')

print(df.head())  # Display the first few rows

In this example, we use the read_csv function from pandas with the delimiter parameter set to \t. The head method is used to display the first few rows of the DataFrame.

Read How to Check If a File Exists and Create It If Not in Python?

Handle Large Tab-Delimited Files

When dealing with large datasets, performance can become an issue. Here are some tips to handle large tab-delimited files efficiently:

1. Read the File in Chunks

Reading large files in chunks can help manage memory usage. The pandas library supports reading files in chunks using the chunksize parameter:

import pandas as pd

file_path = 'large_data.tsv'  # Ensure this file exists in your directory
chunk_size = 10000  # Number of rows per chunk

def process(chunk):
    """Replace this function with actual data processing logic."""
    print(f"Processing chunk with {len(chunk)} rows")
    print(chunk.head())  # Display the first few rows of each chunk

# Read and process file in chunks
for chunk in pd.read_csv(file_path, delimiter='\t', chunksize=chunk_size):
    process(chunk)  # Process each chunk

Output:

Processing chunk with 10 rows
   ID         Name  Age           City  Salary
0   1     John Doe   28       New York   55000
1   2  Alice Smith   34    Los Angeles   62000
2   3  Bob Johnson   23        Chicago   48000
3   4   Emma Brown   29        Houston   53000
4   5  Michael Lee   31  San Francisco   71000

You can see the output in the screenshot below.

Read Tab-Delimited Files in Python File in Chunks

Check out How to Copy File and Rename in Python

2. Use Dask for Parallel Processing

Dask is a parallel computing library that integrates seamlessly with pandas. It allows you to work with large datasets by parallelizing operations:

import dask.dataframe as dd

file_path = 'large_data.tsv'
df = dd.read_csv(file_path, delimiter='\t')

print(df.head().compute())

Dask reads the file in parallel, which can significantly speed up the processing time for large datasets.

Read How to Import a Class from a File in Python

Example: Analyze US Population Data

Let’s walk through a practical example of reading and analyzing a tab-delimited file containing US population data. Suppose we have a file named us_population.tsv with the following structure:

State    Population    Year
California    39538223    2020
Texas    29145505    2020
Florida    21538187    2020
New York    20201249    2020
Pennsylvania    13002700    2020

Step 1: Read the File

First, we will read the file using pandas:

import pandas as pd

file_path = 'us_population.tsv'
df = pd.read_csv(file_path, delimiter='\t')

print(df)

Step 2: Data Cleaning

Next, we will clean the data. For instance, we might want to ensure that the population column is of integer type:

df['Population'] = df['Population'].str.replace(',', '').astype(int)
print(df.dtypes)

Check out Python file Does Not Exist Exception

Step 3: Data Analysis

Now, let’s perform some basic data analysis. We will calculate the total population and the average population per state:

total_population = df['Population'].sum()
average_population = df['Population'].mean()

print(f"Total US Population: {total_population}")
print(f"Average Population per State: {average_population}")

Step 4: Data Visualization

Finally, we can visualize the data using matplotlib:

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.bar(df['State'], df['Population'], color='skyblue')
plt.xlabel('State')
plt.ylabel('Population')
plt.title('US Population by State in 2020')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Check out Python File methods

Conclusion

In this tutorial, I explained how to read tab-delimited files in Python. I discussed two methods to accomplish this task: using the CSV module , and pandas library, handling large tab-delimited files by reading files in chunks , using dask for parallel processing. I also covered practical examples step-by-step.

You may like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.