As a Python developer with over a decade of experience in deep learning frameworks, I’ve found PyTorch’s tensor manipulation functions to be incredibly useful yet sometimes overlooked. Among these functions, torch.stack() is one that deserves special attention.
When I first started building neural networks, I often struggled with combining multiple tensors efficiently. That’s when I discovered PyTorch’s stack function, a game-changer for how I structure my data.
In this guide, I’ll walk you through everything you need to know about PyTorch’s stack operation, from basic usage to advanced techniques that have saved me countless hours in my machine learning projects.
PyTorch Stack
Python torch.stack() function joins a sequence of tensors along a new dimension. Unlike other joining operations like concatenation, the stack creates a new dimension in the process.
I use a stack when I want to combine tensors that have the same shape. It’s particularly useful for batching data or creating mini-batches for training neural networks.
The basic syntax looks like this:
torch.stack(tensors, dim=0)Where:
tensorsis a sequence of tensors (like a list) of the same shapedimis the dimension along which to stack (default is 0)
Read Cross-Entropy Loss PyTorch
Basic Usage of PyTorch Stack
Let me show you how I typically use a stack in my day-to-day work. First, let’s import PyTorch and create some example tensors:
import torch
# Create three 2x3 tensors
tensor1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
tensor2 = torch.tensor([[7, 8, 9], [10, 11, 12]])
tensor3 = torch.tensor([[13, 14, 15], [16, 17, 18]])
# Print the shape of one tensor
print(f"Shape of tensor1: {tensor1.shape}")This will output:
Shape of tensor1: torch.Size([2, 3])You can refer to the screenshot below to see the output:

Now, let’s stack these tensors along dimension 0 (the default):
stacked_tensors = torch.stack([tensor1, tensor2, tensor3])
print(f"Shape after stacking: {stacked_tensors.shape}")
print(stacked_tensors)The output will be:
Shape after stacking: torch.Size([3, 2, 3])
tensor([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18]]])Notice how we now have a 3D tensor with shape [3, 2, 3]. The first dimension (of size 3) corresponds to our three original tensors.
Check out Adam Optimizer PyTorch
Stack vs. Cat (Concatenate): Understanding the Difference
One common confusion I’ve encountered when mentoring junior data scientists is distinguishing between torch.stack() and torch.cat(). Let me clear this up:
# Using stack (creates a new dimension)
stacked = torch.stack([tensor1, tensor2], dim=0)
print(f"Stack shape: {stacked.shape}")
# Using cat (combines along an existing dimension)
cat_0 = torch.cat([tensor1, tensor2], dim=0)
print(f"Cat dim 0 shape: {cat_0.shape}")
cat_1 = torch.cat([tensor1, tensor2], dim=1)
print(f"Cat dim 1 shape: {cat_1.shape}") Output:
Stack shape: torch.Size([2, 2, 3])
Cat dim 0 shape: torch.Size([4, 3])
Cat dim 1 shape: torch.Size([2, 6])You can refer to the screenshot below to see the output:

The key difference is that stack creates a new dimension, while cat combines tensors along an existing dimension. I choose between them based on whether I want to introduce a new dimension or not.
Method 1: Stack Along Different Dimensions
While stacking along dimension 0 is most common, I’ve found stacking along other dimensions equally useful in certain scenarios:
# Create two 2x3 tensors
t1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
t2 = torch.tensor([[7, 8, 9], [10, 11, 12]])
# Stack along dimension 0 (creates a new first dimension)
stacked_dim0 = torch.stack([t1, t2], dim=0)
print(f"Stacked along dim 0 shape: {stacked_dim0.shape}")
# Stack along dimension 1 (creates a new second dimension)
stacked_dim1 = torch.stack([t1, t2], dim=1)
print(f"Stacked along dim 1 shape: {stacked_dim1.shape}")
# Stack along dimension 2 (creates a new third dimension)
stacked_dim2 = torch.stack([t1, t2], dim=2)
print(f"Stacked along dim 2 shape: {stacked_dim2.shape}") Output:
Stacked along dim 0 shape: torch.Size([2, 2, 3])
Stacked along dim 1 shape: torch.Size([2, 2, 3])
Stacked along dim 2 shape: torch.Size([2, 3, 2])You can refer to the screenshot below to see the output:

The dimension we stack along determines where the new dimension is inserted. I often use dimension 1 stacking when working with sequence models like LSTMs.
Check out PyTorch nn Linear
Method 2: Stack Tensors with Different Data Types
When working with real datasets, I sometimes need to combine tensors with different data types. Here’s how I handle it:
# Create tensors with different dtypes
tensor_float = torch.tensor([[1.1, 2.2], [3.3, 4.4]], dtype=torch.float32)
tensor_int = torch.tensor([[1, 2], [3, 4]], dtype=torch.int32)
# Convert to the same dtype before stacking
tensor_int_as_float = tensor_int.to(torch.float32)
stacked_tensors = torch.stack([tensor_float, tensor_int_as_float])
print(f"Stacked tensor dtype: {stacked_tensors.dtype}") # torch.float32This ensures compatibility and prevents errors that would otherwise occur when trying to stack tensors with different data types.
Method 3: Use a Stack in Neural Network Architectures
One of my favorite applications of the stack is in building ensemble models:
# Simulating outputs from 3 different models, each predicting probabilities for 5 classes
model1_output = torch.softmax(torch.randn(10, 5), dim=1) # 10 samples, 5 classes
model2_output = torch.softmax(torch.randn(10, 5), dim=1)
model3_output = torch.softmax(torch.randn(10, 5), dim=1)
# Stack the outputs
ensemble_outputs = torch.stack([model1_output, model2_output, model3_output], dim=0)
print(f"Ensemble outputs shape: {ensemble_outputs.shape}") # torch.Size([3, 10, 5])
# Average the predictions (simple ensemble)
ensemble_prediction = torch.mean(ensemble_outputs, dim=0)
print(f"Final prediction shape: {ensemble_prediction.shape}") # torch.Size([10, 5])This approach has helped me improve model accuracy in several production systems by combining the strengths of multiple models.
Real-World Example: Image Batch Processing
Here’s a practical example from a computer vision project I worked on. I needed to process multiple images from a dataset of US landmarks:
# Simulating loading three grayscale images (64x64)
image1 = torch.rand(64, 64) # Simulated Golden Gate Bridge
image2 = torch.rand(64, 64) # Simulated Statue of Liberty
image3 = torch.rand(64, 64) # Simulated Mount Rushmore
# Stack to create a batch
image_batch = torch.stack([image1, image2, image3])
print(f"Batch shape: {image_batch.shape}") # torch.Size([3, 64, 64])
# Now we can process the batch through a CNN
# model(image_batch) would process all images at onceThis approach dramatically speeds up neural network training by processing multiple samples simultaneously.
Read PyTorch Batch Normalization
Advanced Usage: Stack with Dynamic Lists
In real projects, I often don’t know in advance how many tensors I’ll need to stack. Here’s how I handle dynamic stacking:
# Simulating a variable number of features extracted from data
feature_list = []
num_samples = 5 # Could vary based on available data
# Generate and collect features
for i in range(num_samples):
# In a real scenario, this might be feature extraction from different data points
feature = torch.randn(128) # 128-dimensional feature vector
feature_list.append(feature)
# Stack all features at once
feature_batch = torch.stack(feature_list)
print(f"Feature batch shape: {feature_batch.shape}") # torch.Size([5, 128])This pattern is extremely common in my data processing pipelines, where the number of samples may vary.
Check out PyTorch Load Model
Performance Considerations
When working with large datasets, performance matters. I’ve found that stack is generally efficient, but there are some considerations:
import time
# Create a large list of tensors
large_list = [torch.randn(1000, 1000) for _ in range(100)]
# Time the stack operation
start_time = time.time()
stacked = torch.stack(large_list)
end_time = time.time()
print(f"Time to stack 100 large tensors: {end_time - start_time:.4f} seconds")
print(f"Final tensor size: {stacked.shape}, Memory: {stacked.element_size() * stacked.nelement() / 1e6:.2f} MB")For very large operations, I sometimes pre-allocate the output tensor and fill it manually, which can be more memory-efficient than using the stack directly.
PyTorch Stack with Named Dimensions
In recent PyTorch versions, I’ve started using named tensors to make my code more readable:
# Create tensors with named dimensions
tensor1 = torch.randn(2, 3).refine_names('batch', 'features')
tensor2 = torch.randn(2, 3).refine_names('batch', 'features')
# Stack with names
stacked = torch.stack([tensor1, tensor2], dim=0).refine_names('models', 'batch', 'features')
print(f"Named tensor shape: {stacked.shape}")
print(f"Dimension names: {stacked.names}")This approach has significantly improved code readability in my team’s projects.
PyTorch’s stack function has been an essential tool in my deep learning toolkit for years. Whether I’m batching training data, combining model outputs, or structuring complex neural architectures, stack provides an elegant way to manipulate tensor dimensions.
The key to using a stack effectively is understanding how it creates new dimensions and how this differs from other joining operations like concatenation. Once you grasp this concept, you’ll find yourself reaching for a stack in many different scenarios.
I hope this guide helps you leverage the PyTorch stack in your projects. Remember that mastering tensor operations is fundamental to building efficient deep learning systems, and stack is one function worth having in your arsenal.
You may also like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.