How To Read Tab-Delimited Files In Python?

In this tutorial, I will explain how to read tab-delimited files in Python. Tab-delimited files, commonly known as TSV (Tab-Separated Values) files, are a simple text format for storing data in a tabular structure. Recently someone asked me how to read tab-delimited files which made me explore more about this topic and I will share my findings in this post with examples.

This Tutorial Covers:

Read Tab-Delimited Files in Python

Python provides several ways to read tab-delimited files. Let’s explore some of the most common methods.

Read How to Unzip a File in Python?

1. Use the csv Module

The csv module in Python’s standard library supports reading and writing tab-delimited files. Here’s how you can use it:

import csv

file_path = 'data.tsv'

with open(file_path, newline='') as file:
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        print(row)

Output:

['Name    Age    City']
['John    25    New York']
['Alice   30    Los Angeles']
['Bob     22    Chicago']

You can see the output in the screenshot below.

In this example, we open the file data.tsv and use the csv.reader with the delimiter set to \t (tab character). Each row is read as a list of strings.

Check out How to Get the File Size in MB using Python?

2. Use the pandas Library

The pandas library in Python provides a more useful and flexible way to handle tab-delimited files. It can read the file into a DataFrame, which is a tabular data structure similar to a spreadsheet.

import pandas as pd

file_path = 'data.tsv'
df = pd.read_csv(file_path, delimiter='\t')

print(df.head())

Output:

         Name    Age    City
0     John    25    New York
1  Alice   30    Los Angeles
2      Bob     22    Chicago

You can see the output in the screenshot below.

import pandas as pd

file_path = r"C:\Users\Public\code\data.tsv"
df = pd.read_csv(file_path, delimiter='\t')

print(df.head()) # Display the first few rows

In this example, we use the read_csv function from pandas with the delimiter parameter set to \t. The head method is used to display the first few rows of the DataFrame.

Read How to Check If a File Exists and Create It If Not in Python?

Handle Large Tab-Delimited Files

When dealing with large datasets, performance can become an issue. Here are some tips to handle large tab-delimited files efficiently:

1. Read the File in Chunks

Reading large files in chunks can help manage memory usage. The pandas library supports reading files in chunks using the chunksize parameter:

import pandas as pd

file_path = 'large_data.tsv'  # Ensure this file exists in your directory
chunk_size = 10000  # Number of rows per chunk

def process(chunk):
    """Replace this function with actual data processing logic."""
    print(f"Processing chunk with {len(chunk)} rows")
    print(chunk.head())  # Display the first few rows of each chunk

# Read and process file in chunks
for chunk in pd.read_csv(file_path, delimiter='\t', chunksize=chunk_size):
    process(chunk)  # Process each chunk

Output:

Processing chunk with 10 rows
   ID         Name  Age           City  Salary
0   1     John Doe   28       New York   55000
1   2  Alice Smith   34    Los Angeles   62000
2   3  Bob Johnson   23        Chicago   48000
3   4   Emma Brown   29        Houston   53000
4   5  Michael Lee   31  San Francisco   71000

You can see the output in the screenshot below.

Read Tab-Delimited Files in Python File in Chunks

Check out How to Copy File and Rename in Python

2. Use Dask for Parallel Processing

Dask is a parallel computing library that integrates seamlessly with pandas. It allows you to work with large datasets by parallelizing operations:

import dask.dataframe as dd

file_path = 'large_data.tsv'
df = dd.read_csv(file_path, delimiter='\t')

print(df.head().compute())

Dask reads the file in parallel, which can significantly speed up the processing time for large datasets.

Read How to Import a Class from a File in Python

Example: Analyze US Population Data

Let’s walk through a practical example of reading and analyzing a tab-delimited file containing US population data. Suppose we have a file named us_population.tsv with the following structure:

State    Population    Year
California    39538223    2020
Texas    29145505    2020
Florida    21538187    2020
New York    20201249    2020
Pennsylvania    13002700    2020

Step 1: Read the File

First, we will read the file using pandas:

import pandas as pd

file_path = 'us_population.tsv'
df = pd.read_csv(file_path, delimiter='\t')

print(df)

Step 2: Data Cleaning

Next, we will clean the data. For instance, we might want to ensure that the population column is of integer type:

df['Population'] = df['Population'].str.replace(',', '').astype(int)
print(df.dtypes)

Check out Python file Does Not Exist Exception

Step 3: Data Analysis

Now, let’s perform some basic data analysis. We will calculate the total population and the average population per state:

total_population = df['Population'].sum()
average_population = df['Population'].mean()

print(f"Total US Population: {total_population}")
print(f"Average Population per State: {average_population}")

Step 4: Data Visualization

Finally, we can visualize the data using matplotlib:

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.bar(df['State'], df['Population'], color='skyblue')
plt.xlabel('State')
plt.ylabel('Population')
plt.title('US Population by State in 2020')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Check out Python File methods

Conclusion

In this tutorial, I explained how to read tab-delimited files in Python. I discussed two methods to accomplish this task: using the CSV module , and pandas library, handling large tab-delimited files by reading files in chunks , using dask for parallel processing. I also covered practical examples step-by-step.

How to Read Tab-Delimited Files in Python?

Read Tab-Delimited Files in Python

1. Use the csv Module

2. Use the pandas Library

Handle Large Tab-Delimited Files

1. Read the File in Chunks

2. Use Dask for Parallel Processing

Example: Analyze US Population Data

Step 1: Read the File

Step 2: Data Cleaning

Step 3: Data Analysis

Step 4: Data Visualization

Conclusion

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends