Np.genfromtxt() Function In Python

While I was working on a data analysis project, I needed to import a CSV file with some missing values and custom delimiters. The usual pandas read_csv() wasn’t flexible enough for my specific requirements. That’s when np.genfromtxt() came to my rescue!

In this article, I’ll cover how to use NumPy’s genfromtxt() function to load data from text files with complete control over every aspect of the import process.

Whether you’re dealing with missing values, custom delimiters, or need to skip rows, this function offers the flexibility you need.

np.genfromtxt()

np.genfromtxt() is a effective NumPy function that loads data from text files (like CSV or TSV) into Python NumPy arrays. Unlike simpler functions, it gives you extensive control over how your data is processed during import.

The function is especially useful when dealing with:

Files with missing or invalid values
Custom delimiters
Files that need row/column skipping
Data that requires type conversion

Read np.savetxt() Function in Python

Basic Usage of np.genfromtxt()

Let’s start with a simple example. Suppose you have a CSV file called ‘sales_data.csv’ with quarterly sales figures for different US states:

import numpy as np

data = np.genfromtxt(
    'sales_data.csv',
    delimiter=',',
    dtype=str
)
print(data)

Output:

[['id' 'q1_sales' 'q2_sales' 'q3_sales' 'q4_sales']
 ['1' '1200' '1300' '1250' '1400']
 ['2' '1500' '1600' '1550' '1650']
 ['3' '1100' '1000' '1150' '1200']]

I executed the above example code and added the screenshot below.

This will load the CSV file with a comma as the delimiter. By default, it will convert all values to floating-point numbers.

Check out NumPy Array to List in Python

Key Parameters of np.genfromtxt() in Python NumPy

Let me show you some key parameters of np.genfromtxt() in Python NumPy

1. Set the Delimiter

The np.genfromtxt() function in NumPy is a powerful tool for reading text files into arrays, especially when handling missing values or mixed data types.

import numpy as np

# Load the data, skip header
data = np.genfromtxt('us_population.txt', delimiter='\t', skip_header=1)

# Print header manually
print("Year\tPopulation")

# Loop through and print each row with tab separation
for row in data:
    year = int(row[0])
    population = int(row[1])
    print(f"{year}\t{population}")

Output:

Year    Population
2010    309349689
2015    320738994

I executed the above example code and added the screenshot below.

Setting the correct delimiter is essential to correctly parse structured data using np.genfromtxt().

Read NumPy Reverse Array in Python

2. Handle Headers with names and skip_header

If your file has header rows, you can either skip them or use them as field names:

# Skip the first row (header)
data = np.genfromtxt('sales_data.csv', delimiter=',', skip_header=1)

# Use the header as field names
data = np.genfromtxt('sales_data.csv', delimiter=',', names=True)

When you use names=True, the returned array will be a structured array with named fields.

3. Deal with Missing Data

One of the most powerful features of genfromtxt() is its ability to handle missing values:

# Define what missing values look like in your file
data = np.genfromtxt('sales_data.csv', delimiter=',', 
                    missing_values=['NA', 'N/A', ''], 
                    filling_values=0)

This replaces ‘NA’, ‘N/A’, or empty fields with 0.

4. Select Specific Columns

You can choose to import only certain columns:

# Import only columns 0, 2, and 3
data = np.genfromtxt('sales_data.csv', delimiter=',', usecols=(0, 2, 3))

Check out NumPy Array to a String in Python

5. Data Type Conversion

Control the data types of your imported data:

# Set specific dtypes for each column
data = np.genfromtxt('customer_data.csv', delimiter=',',
                    dtype=[('name', 'U20'), ('age', 'i4'), ('revenue', 'f8')])

This will load ‘name’ as a string (unicode with max 20 chars), ‘age’ as an integer, and ‘revenue’ as a float.

Advanced Techniques with np.genfromtxt()

Now, I will explain some advanced techniques with np.genfromtxt() in Python NumPy.

Check out np.add.at() Function in Python

Using Converters for Custom Transformations

Sometimes you need to apply custom transformations during import:

# Convert state codes to lowercase
def state_converter(state):
    return state.decode('utf-8').lower()

# Apply 20% discount to all prices
def price_converter(price):
    return float(price) * 0.8

data = np.genfromtxt('sales_data.csv', delimiter=',', 
                    converters={0: state_converter, 2: price_converter})

Skipping Rows and Footers

Skip rows at the beginning or end of your file:

# Skip first 2 rows and last 3 rows
data = np.genfromtxt('log_file.txt', skip_header=2, skip_footer=3)

Handling Comments

Ignore comment lines in your data file:

# Lines starting with # are comments
data = np.genfromtxt('config_data.txt', comments='#')

Read Replace Values in NumPy Array by Index in Python

Practical Examples of Using np.genfromtxt()

Now, I will explain practical examples of using np.genfromtxt() in Python NumPy.

Example 1: Importing Sales Data with Missing Values

Let’s say you have a file with US sales data where some values are missing:

# sales_by_region.csv has missing values marked as 'N/A'
sales_data = np.genfromtxt('sales_by_region.csv', 
                          delimiter=',',
                          names=True,
                          dtype=None,
                          encoding='utf-8',
                          missing_values='N/A',
                          filling_values=0)

# Now you can access data by column names
print(f"Northeast sales: {sales_data['northeast']}")
print(f"Total sales: {np.sum(sales_data['northeast'] + sales_data['midwest'] + sales_data['south'] + sales_data['west'])}")

Example 2: Processing Weather Data

Here’s how to process a weather data file with mixed types:

# Import weather data with date, location, and temperature columns
weather = np.genfromtxt('us_weather_data.csv',
                       delimiter=',',
                       names=True,
                       dtype=[('date', 'U10'), ('location', 'U20'), ('temp_f', 'f4'), ('humidity', 'f4')],
                       encoding='utf-8')

# Calculate average temperature by filtering for a specific location
nyc_temps = weather[weather['location'] == 'New York City']['temp_f']
print(f"Average temperature in NYC: {np.mean(nyc_temps):.1f}°F")

Example 3: Loading Fixed-Width Format Data

Government data is often provided in fixed-width format:

# Census data in fixed-width format
census_data = np.genfromtxt('us_census_sample.txt',
                           delimiter=[11, 8, 6, 4, 10],  # Column widths
                           dtype=None,
                           names=['county', 'state', 'population', 'year', 'income'],
                           encoding='utf-8')

# Find counties with population over 1 million
large_counties = census_data[census_data['population'] > 1000000]
print(f"Number of large counties: {len(large_counties)}")

np.genfromtxt() vs. Alternative Methods

While np.genfromtxt() is powerful, there are alternatives worth considering:

pandas.read_csv(): Better for tabular data, you plan to manipulate with pandas
np.loadtxt(): Faster but less flexible than genfromtxt()
csv module: More control but requires more code
np.fromfile(): Better for binary data

np.genfromtxt() shines when you need precise control over the import process, especially with messy data files containing missing values or requiring type conversions.

Common Issues while working on np.genfromtxt() and Solutions

Let me show you some common issues that are faced while working on np.genfromtxt() and solutions to them.

Memory Errors with Large Files

For large files, you might encounter memory issues:

# Process a large file in chunks
chunk_size = 1000
with open('large_dataset.csv', 'r') as f:
    header = f.readline()  # Read and process header separately

    for i in range(0, 1000000, chunk_size):
        chunk = []
        for j in range(chunk_size):
            line = f.readline()
            if not line:
                break
            chunk.append(line)

        # Process this chunk with genfromtxt
        if chunk:
            data = np.genfromtxt(chunk, delimiter=',')
            # Process the chunk data
            process_data(data)

Deal with Inconsistent Data Types

When a column has mixed types, use the dtype=None option:

# Auto-detect types
mixed_data = np.genfromtxt('mixed_types.csv',
                          delimiter=',',
                          names=True,
                          dtype=None,
                          encoding='utf-8')

I hope you found this article helpful! np.genfromtxt() is one of those functions I use almost daily in my data analysis work because of its flexibility. While it might seem complex at first, its power to handle virtually any text-based data file makes it well worth learning.

Whether you’re working with economic indicators, census data, or any other dataset with challenging formatting, genfromtxt() can likely handle it with just a few parameters.

np.genfromtxt() Function in Python