In this blog post Get Started With Tensors with PyTorch we will walk through how to work with tensors with simple, copy‑paste examples you can use today.
Tensors are the workhorse behind modern AI and numerical computing. Think of them as powerful, N‑dimensional arrays that can live on CPUs or GPUs and support fast math, automatic differentiation, and clean syntax. In this article, we start with a high-level explanation, then move step by step through creating a tensor, indexing it, adding values, and performing common operations you’ll use in real projects.
We’ll use PyTorch because it’s concise, production‑ready, and maps naturally to how engineers think about data. The same ideas apply across frameworks (NumPy, TensorFlow, JAX), but PyTorch keeps the examples clean.
What’s a tensor and why it matters
A tensor generalizes familiar objects:
- 0-D: scalar (e.g., 3.14)
- 1-D: vector (e.g., [1, 2, 3])
- 2-D: matrix (e.g., a spreadsheet)
- 3-D and beyond: stacks of matrices, images, batches, sequences
Under the hood, a tensor is a block of memory plus metadata: shape (dimensions), dtype (precision), device (CPU or GPU), and sometimes a gradient buffer. Operations call into highly optimized libraries (BLAS, cuBLAS, cuDNN) so the same Python code can run fast on GPU. Autograd (reverse‑mode automatic differentiation) keeps track of operations so you can compute gradients for optimization and training.
Setup
# Install if needed (in a fresh environment recommended)
# pip install torch --index-url https://download.pytorch.org/whl/cpu
# or follow official instructions for CUDA builds
import torch
# Optional: pick a device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using device:', device)
Create tensors
You can create tensors from Python lists, NumPy arrays, or with factory functions.
import torch
# From Python data
x = torch.tensor([1, 2, 3], dtype=torch.float32)
M = torch.tensor([[1, 2], [3, 4]], dtype=torch.int64)
# Factory functions
zeros = torch.zeros((2, 3)) # 2x3 of zeros, float32 by default
ones = torch.ones_like(zeros) # same shape as zeros, filled with ones
arng = torch.arange(0, 10, 2) # 0,2,4,6,8
randn = torch.randn(3, 4) # standard normal
# Dtype and device
fp16 = torch.randn(2, 2, dtype=torch.float16, device=device)
print(x.dtype, x.shape, x.device)
Tip: use arange for integer steps and linspace for evenly spaced points in a range.
Indexing and slicing
Torch indexing feels like NumPy: square brackets, slices, masks, and advanced indexing.
v = torch.tensor([10, 20, 30, 40, 50])
print(v[0]) # 10
print(v[-1]) # 50
print(v[1:4]) # [20, 30, 40]
A = torch.arange(1, 13).reshape(3, 4) # [[1..4],[5..8],[9..12]]
print(A[0, 0]) # top-left
print(A[:, 0]) # first column
print(A[1:, 2:]) # bottom-right 2x2 block
# Boolean mask
mask = A % 2 == 0
print(A[mask]) # all even numbers
# Advanced indexing with a list of indices
rows = torch.tensor([0, 2])
cols = torch.tensor([1, 3])
print(A[rows, cols]) # elements at (0,1) and (2,3)
Adding values and common math
PyTorch supports element‑wise ops, broadcasting, and linear algebra. Out‑of‑place operations return new tensors; in‑place operations modify existing tensors (ending with an underscore, like add_
).
x = torch.tensor([1.0, 2.0, 3.0])
# Element-wise
print(x + 10) # [11, 12, 13]
print(x * 2) # [2, 4, 6]
# In-place (modifies x)
x.add_(5) # x becomes [6, 7, 8]
# Broadcasting: shapes auto-expand when compatible
B = torch.ones((3, 4))
bias = torch.tensor([1.0, 2.0, 3.0, 4.0])
print(B + bias) # bias added to each row
# Reductions
print(B.sum()) # scalar sum
print(B.mean(dim=0)) # column-wise mean -> shape (4,)
print(B.max(dim=1)) # per-row max + indices
# Linear algebra
A = torch.randn(2, 3)
W = torch.randn(3, 4)
Y = A @ W # matrix multiply (2x4)
u = torch.randn(4)
v = torch.randn(4)
print(torch.dot(u, v)) # dot product
Broadcasting rules: trailing dimensions must match or be 1; PyTorch virtually “stretches” size‑1 dimensions without copying data. If the shapes don’t align, you’ll get a runtime error—check shapes with .shape
.
Working with shapes
Reshaping is the glue for real workloads. You’ll stack batches, flatten features, reorder channels, and add singleton dimensions.
T = torch.arange(24).reshape(2, 3, 4) # (batch=2, rows=3, cols=4)
# Reshape/flatten
flat = T.reshape(2, -1) # infer last dim -> (2, 12)
# Transpose/permute
T_tr = T.transpose(1, 2) # swap dims 1 and 2 -> (2, 4, 3)
T_perm = T.permute(2, 0, 1) # reorder arbitrarily -> (4, 2, 3)
# Add/remove singleton dims
x = torch.tensor([1, 2, 3]) # (3,)
xu = x.unsqueeze(0) # (1, 3)
xs = xu.squeeze(0) # back to (3,)
# Concatenate and stack
A = torch.ones(2, 3)
B = torch.zeros(2, 3)
cat_rows = torch.cat([A, B], dim=0) # (4, 3)
stack_newdim = torch.stack([A, B], dim=0) # (2, 2, 3)
Use reshape
for a safe reshape (it handles non‑contiguous memory by copying if needed). view
is faster but requires contiguous tensors.
Device management and moving data
Switching between CPU and GPU is explicit and simple.
# Create on CPU, move to GPU if available
x = torch.randn(1024, 1024)
x = x.to(device)
# Bring back to CPU (e.g., to convert to NumPy)
xcpu = x.to('cpu')
Moving large tensors between CPU and GPU is relatively expensive. Keep tensors on the device where the bulk of computation happens.
NumPy interoperability
PyTorch and NumPy can share memory (zero‑copy) on CPU.
import numpy as np
# NumPy -> Torch (shares memory)
a_np = np.array([1, 2, 3], dtype=np.float32)
a_t = torch.from_numpy(a_np) # CPU only
# Change in one reflects in the other
a_np[0] = 99
print(a_t) # tensor([99., 2., 3.])
# Torch -> NumPy
b_t = torch.tensor([4.0, 5.0, 6.0])
b_np = b_t.numpy() # requires CPU tensor
# If tensor requires grad, detach first
w = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
wnp = w.detach().cpu().numpy()
Autograd in one minute
Autograd builds a computation graph as you operate on tensors with requires_grad=True
. Calling backward()
computes gradients with respect to those tensors.
w = torch.tensor([2.0, -1.0, 0.5], requires_grad=True)
loss = (w**2).sum() # simple quadratic: 4 + 1 + 0.25 = 5.25
loss.backward()
print(w.grad) # gradient is 2*w -> [4.0, -2.0, 1.0]
# Zero gradients between steps in optimization loops
w.grad.zero_()
Note: avoid in‑place ops on tensors that require grad if those ops are part of the computation graph; they can invalidate history. When in doubt, use out‑of‑place operations or .clone()
.
Mini workflow example
Let’s put it together: normalize a batch, apply a linear layer, and compute a simple loss.
torch.manual_seed(0)
# Fake batch: 32 samples, 10 features
X = torch.randn(32, 10, device=device)
# Normalize per-feature: (X - mean) / std
mean = X.mean(dim=0, keepdim=True)
std = X.std(dim=0, unbiased=False, keepdim=True)
Xn = (X - mean) / (std + 1e-6)
# Linear layer: Y = Xn @ W + b
W = torch.randn(10, 3, device=device, requires_grad=True)
b = torch.zeros(3, device=device, requires_grad=True)
Y = Xn @ W + b # (32, 3)
# Target scores (pretend regression)
target = torch.randn(32, 3, device=device)
loss = ((Y - target) ** 2).mean() # MSE
loss.backward() # compute gradients w.r.t. W and b
print('Loss:', float(loss))
print('Grad norm W:', W.grad.norm().item())
print('Grad norm b:', b.grad.norm().item())
This snippet demonstrates common patterns: reductions, broadcasting, matrix multiplies, and autograd across CPU/GPU seamlessly.
Cheat sheet
- Create:
torch.tensor
,zeros
,ones
,arange
,randn
- Inspect:
.shape
,.dtype
,.device
- Index:
t[i]
,t[i:j]
, masks, advanced indexing - Math: element‑wise ops, reductions (
sum
,mean
,max
),@
for matmul - Shapes:
reshape
,permute
,unsqueeze
/squeeze
,cat
,stack
- Device:
.to('cuda')
/.to('cpu')
- Autograd: set
requires_grad=True
, callbackward()
- I/O:
from_numpy
and.numpy()
(CPU tensors)
Wrap up
Tensors turn math into scalable, production‑ready code. You learned how to create, index, add, reshape, and compute with tensors; how broadcasting and reductions work; and how to use devices and autograd. With these building blocks, you can handle everything from feature engineering to deep learning training loops efficiently.
Next step: paste the snippets into a notebook or script, swap in your own data shapes, and build from there. When you’re ready to scale up, keep your tensors on the right device and let PyTorch’s kernels do the heavy lifting.
Discover more from CPI Consulting
Subscribe to get the latest posts sent to your email.