How to Split a File into Multiple Files in Python?

Recently in a Python webinar, someone asked me how to split a file into multiple files using Python. After researching and experimenting I found three important methods to accomplish this task. In this tutorial, I will explain the important methods along with suitable examples.

Split a File into Multiple Files in Python

Python provides various efficient methods to split a file into multiple files in Python.

Read How to Read Tab-Delimited Files in Python?

Method 1: Use Basic Python

Step-by-Step Guide

  1. Read the Large File: Open the file and read its content.
  2. Split the Content: Divide the content into smaller parts.
  3. Write Smaller Files: Write these parts into new files.

Here’s a basic example to illustrate this:

def split_file(file_path, lines_per_file):
    with open(file_path, 'r', encoding='utf-8') as file:
        lines = file.readlines()

    for file_count, i in enumerate(range(0, len(lines), lines_per_file), start=1):
        output_filename = f'output_file_{file_count}.txt'
        with open(output_filename, 'w', encoding='utf-8') as output_file:
            output_file.writelines(lines[i:i+lines_per_file])
        print(f"{output_filename} created successfully.")

if __name__ == "__main__":
    split_file('big_file.txt', 1000)

Output:

output_file_1.txt created successfully.
output_file_2.txt created successfully.
output_file_3.txt created successfully.
output_file_4.txt created successfully.
output_file_5.txt created successfully.

You can see the output in the screenshot below.

Split a File into Multiple Files in Python

In this example, big_file.txt is split into smaller files, each containing 1000 lines. This method is simple and works well for moderately large files.

Check out How to Unzip a File in Python?

Method 2: Use split Method

Python’s split method can also be used to divide the content based on a specific delimiter. This is particularly useful when dealing with structured data like CSV files.

Example with CSV Files

Suppose you have a large CSV file containing sales data from Los Angeles. You can split this file based on rows:

import csv

def split_csv(file_path, rows_per_file):
    with open(file_path, 'r') as file:
        reader = csv.reader(file)
        headers = next(reader)
        rows = list(reader)

    file_count = 1
    for i in range(0, len(rows), rows_per_file):
        with open(f'sales_data_{file_count}.csv', 'w', newline='') as output_file:
            writer = csv.writer(output_file)
            writer.writerow(headers)
            writer.writerows(rows[i:i+rows_per_file])
        file_count += 1

if __name__ == "__main__":
    split_csv('sales_data.csv', 500)

You can see the output in the screenshot below.

How to Split a File into Multiple Files in Python

This script reads the CSV file, splits the rows into smaller chunks, and writes them into new CSV files. Each new file contains 500 rows of sales data.

Read How to Get the File Size in MB using Python?

Method 3: Use Pandas Library

For more advanced file handling, the Pandas library is an excellent choice. It provides powerful data structures and functions for data manipulation and analysis.

Install Pandas

First, install Pandas using pip:

pip install pandas

Check out How to Check If a File Exists and Create It If Not in Python?

Example with DataFrame

Let’s say you have a large dataset from a survey conducted in Chicago. You can use Pandas to split this file:

import pandas as pd

def split_dataframe(file_path, rows_per_file):
    df = pd.read_csv(file_path)

    file_count = 1
    for i in range(0, len(df), rows_per_file):
        df_subset = df.iloc[i:i+rows_per_file]
        df_subset.to_csv(f'survey_data_{file_count}.csv', index=False)
        file_count += 1

if __name__ == "__main__":
    split_dataframe('survey_data.csv', 1000)

This script reads the CSV file into a DataFrame, splits it into smaller DataFrames, and writes them into new CSV files. Each resulting file contains 1000 rows of survey data.

Read How to Copy File and Rename in Python

Handle Large Files Efficiently

When dealing with extremely large files, reading the entire file into memory may not be feasible. In such cases, processing the file line by line is more efficient.

Example with Line-by-Line Processing

Consider a scenario where you have a massive log file from a server in San Francisco. You can process it line by line to avoid memory issues:

def split_large_file(file_path, lines_per_file):
    with open(file_path, 'r') as file:
        file_count = 1
        lines = []

        for line in file:
            lines.append(line)
            if len(lines) == lines_per_file:
                with open(f'server_log_{file_count}.txt', 'w') as output_file:
                    output_file.writelines(lines)
                file_count += 1
                lines = []

        # Write remaining lines
        if lines:
            with open(f'server_log_{file_count}.txt', 'w') as output_file:
                output_file.writelines(lines)

if __name__ == "__main__":
    split_large_file('server_log.txt', 10000)

This script reads the log file line by line, collects lines into a list, and writes the list to a new file once it reaches the specified number of lines. This method ensures that you never load the entire file into memory.

Check out How to Import a Class from a File in Python

Conclusion

In this tutorial, I helped you learn how to split a file into multiple files in Python. I explained three methods such as using basic Python, using the split method, using the pandas library. I also discussed practical example and handling large files efficiently.

You may like to read:

51 Python Programs

51 PYTHON PROGRAMS PDF FREE

Download a FREE PDF (112 Pages) Containing 51 Useful Python Programs.

pyython developer roadmap

Aspiring to be a Python developer?

Download a FREE PDF on how to become a Python developer.

Let’s be friends

Be the first to know about sales and special discounts.