Recently, I was working on a machine learning project where I needed to process a large dataset with mostly zero values. The regular NumPy arrays were consuming too much memory and slowing down my computations. The issue is, dense matrices aren’t efficient for sparse data. So we need a specialized data structure.
In this article, I’ll cover how to use SciPy’s CSR matrix format to efficiently handle sparse data in Python (with examples from text processing to network analysis).
So let’s get started!
What is a Sparse Matrix and Why Use CSR Format?
A sparse matrix is a matrix where most elements are zero. Think of a term-document matrix in text analysis where you might have thousands of documents and tens of thousands of words, but each document only contains a tiny fraction of all possible words.
The Compressed Sparse Row (CSR) format stores only the non-zero elements along with their positions, making it memory-efficient and fast for many operations.
Here’s a quick comparison:
import numpy as np
from scipy.sparse import csr_matrix
import sys
# Create a matrix with mostly zeros
dense_matrix = np.zeros((10000, 10000))
dense_matrix[0, 1] = 1
dense_matrix[1, 2] = 2
dense_matrix[9999, 9999] = 3
# Create the same matrix in CSR format
sparse_matrix = csr_matrix(dense_matrix)
# Compare memory usage
print(f"Dense matrix size: {sys.getsizeof(dense_matrix) / 1024 / 1024:.2f} MB")
print(f"Sparse matrix size: {sys.getsizeof(sparse_matrix.data) / 1024:.2f} KB")The difference in memory usage can be dramatic – often 100× or more for very sparse data!
Create a CSR Matrix in SciPy
There are several ways to create a CSR matrix in SciPy:
Method 1 – From an Existing Array
import numpy as np
from scipy.sparse import csr_matrix
# Create from a dense NumPy array
array = np.array([[1, 0, 0], [0, 2, 0], [0, 0, 3]])
csr = csr_matrix(array)
print(csr)Output:
<Compressed Sparse Row sparse matrix of dtype 'int32'
with 3 stored elements and shape (3, 3)>
Coords Values
(0, 0) 1
(1, 1) 2
(2, 2) 3You can see the output in the screenshot below.

This method converts a dense NumPy array into a CSR matrix automatically.
Internally, SciPy extracts:
data: The non-zero values from the array →[1, 2, 3]indices: The column indices of those non-zero values →[0, 1, 2]indptr: Index pointers showing where each row starts indata→[0, 1, 2, 3]
Check out How to use Python SciPy Linprog
Method 2 – From COO Format (Coordinates)
from scipy.sparse import csr_matrix
# Create from (data, (row_ind, col_ind)) format
row = np.array([0, 1, 2])
col = np.array([0, 1, 2])
data = np.array([1, 2, 3])
csr = csr_matrix((data, (row, col)), shape=(3, 3))
print(csr)Output:
<Compressed Sparse Row sparse matrix of dtype 'int32'
with 3 stored elements and shape (3, 3)>
Coords Values
(0, 0) 1
(1, 1) 2
(2, 2) 3You can see the output in the screenshot below.

This method builds the matrix using a coordinate format, where you specify:
row: The row indices of the non-zero valuescol: The column indicesdata: The corresponding non-zero values
Read Use Python SciPy Differential Evolution
Method 3 – Use CSR Constructor Directly
from scipy.sparse import csr_matrix
# Direct CSR format components
indptr = np.array([0, 1, 2, 3])
indices = np.array([0, 1, 2])
data = np.array([1, 2, 3])
csr = csr_matrix((data, indices, indptr), shape=(3, 3))
print(csr)Output:
<Compressed Sparse Row sparse matrix of dtype 'int32'
with 3 stored elements and shape (3, 3)>
Coords Values
(0, 0) 1
(1, 1) 2
(2, 2) 3You can see the output in the screenshot below.

The third method uses the internal CSR format directly, which consists of three arrays:
data: Contains the non-zero valuesindices: Contains the column indices of the non-zero valuesindptr: Contains the locations indatathat starts a row
Read Python SciPy Ndimage Imread Tutorial
Convert Between Different Matrix Formats
Sometimes you may need to convert between different sparse matrix formats or to/from dense matrices:
# From CSR to dense
dense_array = csr.toarray()
# From CSR to CSC (Compressed Sparse Column)
from scipy.sparse import csc_matrix
csc = csr.tocsc()
# From CSR to COO (Coordinate format)
coo = csr.tocoo()
# Back to CSR
csr_again = coo.tocsr()Each format has its strengths for different operations, so conversion can be useful depending on your task.
Efficient Operations with CSR Matrices
CSR matrices excel at row-wise operations and matrix-vector multiplications. Here are some common operations:
Check out Python SciPy Smoothing
Matrix-Vector Multiplication
Efficiently compute matrix-vector products using the fast dot operation supported by CSR matrices.
import numpy as np
from scipy.sparse import csr_matrix
# Create a sparse matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csr = csr_matrix((data, (row, col)), shape=(3, 3))
# Create a vector
v = np.array([1, 2, 3])
# Multiply
result = csr.dot(v)
print(result) # Output: [7 9 28]Slicing and Element Access
Easily access rows, elements, or slices of CSR matrices, but be cautious when modifying values due to structural overhead.
# Get a row
row_0 = csr[0, :].toarray().flatten()
print(row_0) # Output: [1 0 2]
# Get a specific element
element = csr[2, 1]
print(element) # Output: 5
# Set a value (creates a new matrix)
csr[0, 1] = 7Note that modifying a CSR matrix is generally inefficient because it may require rebuilding the internal data structures. If you need to make many modifications, consider using another format like LIL (List of Lists) for construction, then converting to CSR.
Read Python SciPy Pairwise Distance
Real-World Applications of CSR Matrices
Let me explain to you the real-world applications of CSR matrices.
Text Processing and NLP
One of the most common uses of CSR matrices is in text analysis with the bag-of-words model:
from sklearn.feature_extraction.text import CountVectorizer
from scipy.sparse import csr_matrix
# Example documents
documents = [
"I love machine learning and Python",
"Sparse matrices are efficient",
"Python is great for data science"
]
# Create a vocabulary and document-term matrix
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(documents)
# X is already in CSR format!
print(type(X)) # Output: <class 'scipy.sparse.csr.csr_matrix'>
print(X.shape) # Output: (3, n_unique_words)Network Analysis
CSR matrices are perfect for representing adjacency matrices in graph theory:
import numpy as np
from scipy.sparse import csr_matrix
import networkx as nx
import matplotlib.pyplot as plt
# Create an adjacency matrix for a directed graph
# Edge from 0->1, 0->2, 1->2, 2->0
rows = np.array([0, 0, 1, 2])
cols = np.array([1, 2, 2, 0])
data = np.ones(4) # All edges have weight 1
adj_matrix = csr_matrix((data, (rows, cols)), shape=(3, 3))
# Convert to NetworkX graph
G = nx.from_scipy_sparse_matrix(adj_matrix, create_using=nx.DiGraph)
# Plot
plt.figure(figsize=(8, 6))
nx.draw(G, with_labels=True, node_color='lightblue',
node_size=500, arrowsize=20, font_size=15)
plt.title("Graph from CSR Matrix")
plt.show()Check out Python SciPy Spatial Distance Cdist
Machine Learning with Sparse Features
Many machine learning algorithms in scikit-learn work directly with CSR matrices:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from scipy.sparse import csr_matrix
# Generate a sparse classification problem
X, y = make_classification(n_samples=1000, n_features=10000, n_informative=10,
random_state=42)
# Make X sparse by zeroing out small values
X[abs(X) < 0.9] = 0
X_sparse = csr_matrix(X)
# Train model with sparse matrix
model = LogisticRegression(solver='saga')
model.fit(X_sparse, y)
print(f"Model accuracy: {model.score(X_sparse, y):.2f}")Performance Tips for CSR Matrices
- Choose the right format for your operations: CSR is great for row-wise operations and matrix-vector products. If you need column-wise operations, consider CSC instead.
- Avoid frequent modifications: If you need to build a matrix incrementally, use a format like LIL or DOK, then convert to CSR when done.
- Use specialized sparse functions: SciPy provides specialized functions for sparse matrices that are more efficient than their dense counterparts.
- Be cautious with operations that might densify: Some operations (like certain matrix multiplications) can turn a sparse matrix into a dense one, defeating the purpose.
I hope you found this article helpful! CSR matrices are a powerful tool in the Python scientific computing ecosystem, enabling efficient processing of sparse data that would otherwise be impossible due to memory constraints. Whether you’re working with text data, network analysis, or machine learning, understanding this format can significantly improve your code’s performance and capabilities.
You may like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.