Working with MATLAB files in Python projects is a common challenge I face in data science workflows. When collaborating with MATLAB users or working with legacy scientific datasets, knowing how to properly load and manipulate .mat files becomes essential.
In this article, I’ll walk you through several practical methods to load MAT files in Python using SciPy. I’ll cover both traditional MAT files and the newer HDF5-based format, along with real-world examples.
So let’s start..
What are MAT Files and Why Use Them with Python?
MAT files are MATLAB’s native file format for storing workspace variables. They’re commonly used in scientific and engineering applications to store numerical arrays, cell arrays, and structs.
As a Python developer, I’ve found numerous scenarios where I need to work with these files:
- Processing legacy research datasets
- Collaborating with engineers using MATLAB
- Working with scientific tools that export MAT files
Let’s dive into how we can handle them efficiently in Python.
Method 1: Load Standard MAT Files with SciPy
The simplest way to load MAT files in Python is to use SciPy’s loadmat() function in Python. This works great for most standard MAT files (version 5 and earlier).
Here’s how to do it:
from scipy import io
import numpy as np
# Load the MAT file
mat_data = io.loadmat('weather_data.mat')
# Print the keys (variable names) in the MAT file
print(mat_data.keys())
# Access a specific variable
temperature = mat_data['temperature']
print(temperature.shape)Output:
dict_keys(['__header__', '__version__', '__globals__', 'temperature'])
(1, 365)I executed the above example code and added the screenshot below.

In this example, I’m loading weather data stored in a MAT file. The file might contain temperature readings across different U.S. cities, which I can then analyze or visualize using Python libraries.
Check out Python SciPy Eigenvalues
Understand the Loaded Data Structure
When you load a MAT file with loadmat(), you get a Python dictionary where:
- Keys are the variable names from MATLAB
- Values are the corresponding NumPy arrays
One important thing to note is that MATLAB’s matrices are stored differently than Python’s. MATLAB uses column-major order, while NumPy uses row-major order by default. However, loadmat() automatically handles this conversion for you.
Method 2: Load Newer MAT Files (HDF5 Format)
For newer MAT files (version 7.3 and later), which use the HDF5 format, the standard loadmat() function might not work properly. These files require the h5py package:
import h5py
import numpy as np
# Load MAT file with h5py
with h5py.File('neural_network_model.mat', 'r') as f:
# List all groups
print("Keys: %s" % f.keys())
# Get data from a specific dataset
weights = np.array(f['model_weights'])
# Convert to numpy array if needed
weights_np = np.array(weights)Output:
Keys: ['model_weights']
Weights shape: (3, 3)
Weights:
[[0.20743864 0.85653662 0.03043611]
[0.02392481 0.14914895 0.16830728]
[0.29223175 0.77879602 0.41192636]]I executed the above example code and added the screenshot below.

I’ve used this approach when working with large neural network models trained in MATLAB that were too complex for the standard loadmat() function in Python.
Work with Nested Structures in HDF5 MAT Files
HDF5 MAT files often contain nested structures that need special handling:
import h5py
import numpy as np
def get_nested_data(f, path):
"""Recursively get data from nested HDF5 groups"""
if isinstance(f[path], h5py.Group):
return {k: get_nested_data(f, f'{path}/{k}') for k in f[path].keys()}
else:
return np.array(f[path])
with h5py.File('complex_dataset.mat', 'r') as f:
data = get_nested_data(f, 'experiment_results')Output:
{'humidity': array([33.88781141, 48.17680718, 48.18671388, 32.95638762, 33.22204234,
56.90440239, 58.34572791, 57.39254405, 33.50733042, 44.92187077]),
'nested': {'pH': array([6.5 , 6.61111111, 6.72222222, 6.83333333, 6.94444444,
7.05555556, 7.16666667, 7.27777778, 7.38888889, 7.5 ]),
'pressure': array([101.63433247, 101.83805656, 100.5685385 , 101.72200795,
101.27633141, 102.8443931 , 100.34990418, 101.54468715,
100.5167384 , 101.94107433])},
'temperature': array([26.53851299, 25.36753071, 24.46883473, 27.07985764, 24.60442595,
25.48856053, 22.23081067, 25.24387026, 24.23305447, 28.9405108 ])}I executed the above example code and added the screenshot below.

This recursive approach has helped me extract complex structures from medical imaging datasets that contained nested patient data, scan parameters, and image matrices.
Check out Python SciPy Kdtree
Method 3: Convert MAT Data to Pandas DataFrame
For data analysis, I often need to convert the loaded MAT data to a Pandas DataFrame:
import pandas as pd
from scipy import io
# Load MAT file
mat_data = io.loadmat('stock_prices.mat')
# Convert to DataFrame
stock_data = pd.DataFrame(
mat_data['daily_prices'],
columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
)
# Convert MATLAB datenum to Python datetime
stock_data['Date'] = pd.to_datetime(stock_data['Date'] - 719529, unit='D')
# Now you can use pandas functionality
print(stock_data.head())I’ve used this technique when analyzing financial datasets from MATLAB users, where converting to a DataFrame makes it much easier to perform time series analysis and visualization.
Method 4: Handle MATLAB Cell Arrays
MATLAB cell arrays require special handling when loaded in Python:
from scipy import io
import numpy as np
# Load MAT file with cell arrays
mat_data = io.loadmat('survey_responses.mat')
# Access cell array data
responses = mat_data['text_responses']
# Extract string data from cells
text_data = []
for cell in responses[0]:
# Get the string from the cell
text = cell[0] if cell.size > 0 else ''
text_data.append(text)
# Now you have a list of strings
print(f"Number of responses: {len(text_data)}")This approach was particularly useful when I needed to process text survey data collected and stored by MATLAB users at a university research project.
Check out Python SciPy Stats Poisson
Method 5: Save Python Data to MAT Files
Sometimes you might need to go the other way and save Python data to MAT files:
from scipy import io
import numpy as np
# Create some data
data = {
'temperature': np.random.rand(100, 5), # Data for 5 cities over 100 days
'cities': np.array(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'], dtype=np.object_)
}
# Save to MAT file
io.savemat('python_generated_data.mat', data)I’ve used this when collaborating with MATLAB users who needed my analysis results in their native format.
Performance Considerations When Loading Large MAT Files
When dealing with large scientific datasets, memory usage becomes a concern. Here’s how to handle large MAT files efficiently:
import h5py
import numpy as np
# Memory-efficient loading for large files
with h5py.File('large_dataset.mat', 'r') as f:
# Get shape information without loading all data
data_shape = f['large_array'].shape
print(f"Array dimensions: {data_shape}")
# Load only a slice of the data
subset = f['large_array'][0:100, 0:100]
print(f"Subset shape: {subset.shape}")This technique saved me when processing satellite imagery data that was too large to fit in memory all at once.
Troubleshoot Common Issues
Let me show you how to troubleshoot some common issues.
Check out Python SciPy Gamma
Handle MATLAB Strings
MATLAB strings can sometimes cause issues:
from scipy import io
import numpy as np
mat_data = io.loadmat('text_data.mat', chars_as_strings=True)
# Convert MATLAB strings to Python strings
labels = [str(label[0]) for label in mat_data['labels']]Deal with Compression
For compressed MAT files:
from scipy import io
# Handle compressed MAT files
mat_data = io.loadmat('compressed_data.mat', struct_as_record=False)I hope you found this guide helpful for working with MAT files in Python using SciPy. Whether you’re collaborating with MATLAB users or processing scientific datasets, these techniques should make your workflow smoother.
You may also like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.