Addition, multiplication, transpose. Matrix as linear transformation.
Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.
Matrices look like “just tables of numbers,” but their real power is that they do something: they represent linear transformations—rules that move, rotate, stretch, and mix vectors in a consistent way.
A matrix is a linear map from ℝⁿ to ℝᵐ. You can add matrices and scale them entry-by-entry. You multiply matrices to compose linear maps (apply one, then the other). The transpose swaps rows and columns and often converts “row” viewpoints into “column” viewpoints.
You already know vectors: you can add them and scale them, and those operations have geometric meaning (combine directions, change magnitude). Matrices extend this idea: they describe linear rules that take an input vector and produce an output vector.
If vectors are “arrows,” then matrices are “machines that transform arrows.” Understanding addition, multiplication, and transpose is mostly about understanding how these machines combine.
A matrix A is a rectangular array of numbers with m rows and n columns. We write A ∈ ℝᵐˣⁿ.
Example:
A = [ [1, 2, 0],
[−1, 3, 4] ] is 2×3
A matrix A ∈ ℝᵐˣⁿ represents a linear map
A: ℝⁿ → ℝᵐ
Given an input vector x ∈ ℝⁿ, the output is y = Ax ∈ ℝᵐ.
The most important way to “see” A is via its columns.
Let A have columns a₁, a₂, …, aₙ, where each aⱼ ∈ ℝᵐ. Write
A = [ a₁ a₂ … aₙ ]
If x = [x₁, x₂, …, xₙ]ᵀ, then
Ax = x₁a₁ + x₂a₂ + … + xₙaₙ
That is: Ax is a linear combination of the columns of A, using the entries of x as weights.
A transformation T is linear if for all vectors u, v and scalars c:
T(u + v) = T(u) + T(v)
T(cu) = cT(u)
Matrix multiplication Ax always defines a linear transformation, and (in finite-dimensional real vector spaces) every linear transformation can be represented by some matrix.
This node is about operations that correspond to natural ways of combining these linear machines:
When you add vectors, you get another vector that represents the combined effect. For matrices, addition and scaling let you build new linear transformations from old ones.
If A and B both take ℝⁿ → ℝᵐ, then you can define a new transformation:
(A + B)(x) = Ax + Bx
This is “add the outputs” for the same input.
You may add matrices only when they have the same shape.
If A, B ∈ ℝᵐˣⁿ, then C = A + B ∈ ℝᵐˣⁿ is defined by
cᵢⱼ = aᵢⱼ + bᵢⱼ
So it’s entry-by-entry.
For scalar c ∈ ℝ and matrix A ∈ ℝᵐˣⁿ:
(cA)ᵢⱼ = c·aᵢⱼ
Again, entry-by-entry.
If A and B represent linear maps ℝⁿ → ℝᵐ, then:
A useful identity to connect “entry rules” to “transformation rules”:
(A + B)x = Ax + Bx
Proof (by columns; a good mental model):
Let A = [a₁ … aₙ], B = [b₁ … bₙ]. Then A + B = [a₁+b₁ … aₙ+bₙ].
So for x = [x₁,…,xₙ]ᵀ:
(A+B)x
= x₁(a₁+b₁) + … + xₙ(aₙ+bₙ)
= (x₁a₁ + … + xₙaₙ) + (x₁b₁ + … + xₙbₙ)
= Ax + Bx
Because addition and scalar multiplication behave like vector operations, the set ℝᵐˣⁿ is a vector space.
Here are the most used properties (they mirror vector properties):
Before adding or scaling, check:
You’ll rely on this habit when you start solving linear systems and doing matrix calculus.
Matrix multiplication is not “entry-by-entry.” It is designed so that multiplying matrices corresponds to composing linear transformations.
If B maps ℝᵖ → ℝⁿ and A maps ℝⁿ → ℝᵐ, then doing B first and then A is a map
ℝᵖ → ℝᵐ
Matrix multiplication encodes exactly that:
(AB)x = A(Bx)
So the order matters: AB means “apply B, then apply A.”
If A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ, then the product AB is defined and has shape ℝᵐˣᵖ.
The “inner” dimensions must match (n).
A quick mnemonic:
(m×n)·(n×p) = (m×p)
For C = AB, the entry cᵢⱼ is
cᵢⱼ = ∑ₖ₌₁ⁿ aᵢₖ bₖⱼ
Interpretation: take row i of A and column j of B, multiply matching entries, and sum.
Let bⱼ be the j-th column of B. Then the j-th column of C = AB is
(AB)ⱼ = Abⱼ
So multiplying on the left by A means: apply A to each column of B.
This is a powerful viewpoint:
In general:
AB ≠ BA
Because composition of functions is not commutative.
Even when both products exist (both are square of same size), they can differ.
If shapes match so everything is defined:
(AB)C = A(BC)
This matters because it lets you write ABC without parentheses (the product is unambiguous).
If shapes match:
A(B + C) = AB + AC
(A + B)C = AC + BC
The identity matrix Iₙ ∈ ℝⁿˣⁿ satisfies
Iₙ x = x for all x ∈ ℝⁿ
And for any compatible matrix A:
Iₘ A = A, AIₙ = A
Suppose A scales x by 2 and y by 1:
A = [ [2, 0],
[0, 1] ]
Suppose B swaps x and y:
B = [ [0, 1],
[1, 0] ]
Then AB means “swap, then scale.” BA means “scale, then swap.” Those are different operations, so the matrices differ.
| Operation | When defined? | Rule | Transformation meaning |
|---|---|---|---|
| A + B | same shape m×n | (aᵢⱼ + bᵢⱼ) | add outputs: (A+B)x = Ax + Bx |
| cA | always | (c·aᵢⱼ) | scale outputs: (cA)x = c(Ax) |
| AB | (m×n)(n×p) | cᵢⱼ = ∑ aᵢₖbₖⱼ | compose maps: (AB)x = A(Bx) |
Transpose looks simple—flip across the diagonal—but it appears everywhere:
If A ∈ ℝᵐˣⁿ, then Aᵀ ∈ ℝⁿˣᵐ and
(Aᵀ)ᵢⱼ = aⱼᵢ
So rows become columns.
Example:
A = [ [1, 2, 3],
[4, 5, 6] ] (2×3)
Aᵀ = [ [1, 4],
[2, 5],
[3, 6] ] (3×2)
These are not just algebra tricks; they encode how “flipping” interacts with other operations.
1) Double transpose returns the original:
(Aᵀ)ᵀ = A
2) Transpose distributes over addition and scalar multiplication:
(A + B)ᵀ = Aᵀ + Bᵀ
(cA)ᵀ = cAᵀ
3) Transpose reverses multiplication order:
(AB)ᵀ = Bᵀ Aᵀ
This one is extremely important.
Let A ∈ ℝᵐˣⁿ, B ∈ ℝⁿˣᵖ.
Consider entry (i, j) of (AB)ᵀ.
((AB)ᵀ)ᵢⱼ
= (AB)ⱼᵢ
= ∑ₖ₌₁ⁿ aⱼₖ bₖᵢ
Now look at (BᵀAᵀ)ᵢⱼ. Here Bᵀ ∈ ℝᵖˣⁿ and Aᵀ ∈ ℝⁿˣᵐ, so the product is ℝᵖˣᵐ, matching.
(BᵀAᵀ)ᵢⱼ
= ∑ₖ₌₁ⁿ (bᵀ)ᵢₖ (aᵀ)ₖⱼ
= ∑ₖ₌₁ⁿ bₖᵢ aⱼₖ
= ∑ₖ₌₁ⁿ aⱼₖ bₖᵢ
So the entries match for all i, j ⇒ the matrices are equal.
For a column vector x ∈ ℝⁿ, xᵀ is a 1×n row vector.
Dot product in matrix notation:
xᵀy = ∑ᵢ xᵢ yᵢ
This is a bridge between “vector algebra” and “matrix algebra.”
A system like
2x + y = 5
−x + 3y = 1
can be written as Ax = b with
A = [ [2, 1],
[−1, 3] ], x = [x, y]ᵀ, b = [5, 1]ᵀ
Then solving the system means finding x such that Ax matches b.
This node prepares you for Gaussian elimination and matrix methods in Systems of Linear Equations.
In ℝ², a matrix can represent:
The column view is especially geometric: the columns of A are where the basis vectors go.
If e₁ = [1,0]ᵀ and e₂ = [0,1]ᵀ, then
Ae₁ = first column of A
Ae₂ = second column of A
So A is fully determined by where it sends the basis.
If B does “normalize then mix features” and A does “project to a smaller space,” then AB is the combined pipeline.
This viewpoint shows up in ML models and in the Jacobian, where derivatives themselves form matrices and are multiplied via chain rule (composition).
If a data matrix X stores examples in rows (common) versus columns (also common), transpose switches conventions.
Transpose lets formulas remain consistent even when you change how you store data.
Once you’re fluent in these operations, many “next” ideas become more natural:
Matrices are not just notation; they are a compact language for linear structure.
Let A = [[1, −2], [0, 3]] and B = [[4, 1], [−1, 2]]. Compute C = 2A + B. Then verify on a vector x = [1, 2]ᵀ that Cx = 2(Ax) + Bx.
Compute 2A entrywise:
2A = [[2·1, 2·(−2)], [2·0, 2·3]] = [[2, −4], [0, 6]].
Add B entrywise:
C = 2A + B
= [[2+4, −4+1], [0+(−1), 6+2]]
= [[6, −3], [−1, 8]].
Compute Ax:
Ax = [[1, −2], [0, 3]] [1, 2]ᵀ
= [1·1 + (−2)·2, 0·1 + 3·2]ᵀ
= [1 − 4, 6]ᵀ
= [−3, 6]ᵀ.
Compute 2(Ax) = 2[−3, 6]ᵀ = [−6, 12]ᵀ.
Compute Bx:
Bx = [[4, 1], [−1, 2]] [1, 2]ᵀ
= [4·1 + 1·2, (−1)·1 + 2·2]ᵀ
= [6, 3]ᵀ.
Add outputs:
2(Ax) + Bx = [−6, 12]ᵀ + [6, 3]ᵀ = [0, 15]ᵀ.
Compute Cx directly:
Cx = [[6, −3], [−1, 8]] [1, 2]ᵀ
= [6·1 + (−3)·2, (−1)·1 + 8·2]ᵀ
= [6 − 6, −1 + 16]ᵀ
= [0, 15]ᵀ.
Conclude Cx = 2(Ax) + Bx, matching the transformation interpretation.
Insight: Matrix addition/scaling are designed so that adding/scaling matrices corresponds exactly to adding/scaling the outputs of the linear transformations they represent.
Let A = [[2, 0], [0, 1]] (scale x by 2) and B = [[0, 1], [1, 0]] (swap coordinates). Compute AB and BA, and test on x = [3, 5]ᵀ.
Compute AB using row-by-column:
AB = [[2, 0], [0, 1]] [[0, 1], [1, 0]]
First row vs columns:
c₁₁ = 2·0 + 0·1 = 0
c₁₂ = 2·1 + 0·0 = 2
Second row vs columns:
c₂₁ = 0·0 + 1·1 = 1
c₂₂ = 0·1 + 1·0 = 0
So AB = [[0, 2], [1, 0]].
Compute BA:
BA = [[0, 1], [1, 0]] [[2, 0], [0, 1]]
First row:
d₁₁ = 0·2 + 1·0 = 0
d₁₂ = 0·0 + 1·1 = 1
Second row:
d₂₁ = 1·2 + 0·0 = 2
d₂₂ = 1·0 + 0·1 = 0
So BA = [[0, 1], [2, 0]].
Observe AB ≠ BA.
Test on x = [3, 5]ᵀ:
Bx = [5, 3]ᵀ (swap)
A(Bx) = A[5, 3]ᵀ = [10, 3]ᵀ
So (AB)x = [10, 3]ᵀ.
Other order:
Ax = [6, 5]ᵀ (scale x)
B(Ax) = B[6, 5]ᵀ = [5, 6]ᵀ
So (BA)x = [5, 6]ᵀ.
Insight: Matrix multiplication encodes composition: AB means “do B first, then A.” Different order ⇒ different transformation.
Let A = [[1, 2, 0], [−1, 3, 4]] (2×3) and B = [[2, 1], [0, −1], [5, 2]] (3×2). Compute (AB)ᵀ and compare with BᵀAᵀ.
Check shapes:
A is 2×3 and B is 3×2, so AB is 2×2 and transpose will be 2×2.
Compute AB:
AB = [[1, 2, 0], [−1, 3, 4]] [[2, 1], [0, −1], [5, 2]]
Entry (1,1): 1·2 + 2·0 + 0·5 = 2
Entry (1,2): 1·1 + 2·(−1) + 0·2 = 1 − 2 = −1
Entry (2,1): (−1)·2 + 3·0 + 4·5 = −2 + 20 = 18
Entry (2,2): (−1)·1 + 3·(−1) + 4·2 = −1 − 3 + 8 = 4
So AB = [[2, −1], [18, 4]].
Transpose it:
(AB)ᵀ = [[2, 18], [−1, 4]].
Compute Bᵀ and Aᵀ:
Bᵀ = [[2, 0, 5], [1, −1, 2]] (2×3)
Aᵀ = [[1, −1], [2, 3], [0, 4]] (3×2).
Multiply BᵀAᵀ (2×3)(3×2) = 2×2:
Entry (1,1): [2,0,5]·[1,2,0]ᵀ = 2·1 + 0·2 + 5·0 = 2
Entry (1,2): [2,0,5]·[−1,3,4]ᵀ = 2·(−1) + 0·3 + 5·4 = −2 + 20 = 18
Entry (2,1): [1,−1,2]·[1,2,0]ᵀ = 1·1 + (−1)·2 + 2·0 = −1
Entry (2,2): [1,−1,2]·[−1,3,4]ᵀ = 1·(−1) + (−1)·3 + 2·4 = −1 − 3 + 8 = 4
So BᵀAᵀ = [[2, 18], [−1, 4]].
Compare:
(AB)ᵀ = BᵀAᵀ.
Insight: Transpose flips multiplication order because it swaps the roles of rows and columns; the entrywise sum ∑ aᵢₖbₖⱼ becomes the same sum but viewed from the other side.
A matrix A ∈ ℝᵐˣⁿ represents a linear transformation ℝⁿ → ℝᵐ via x ↦ Ax.
Column view: Ax = x₁a₁ + … + xₙaₙ (a linear combination of columns of A).
Matrix addition and scalar multiplication are componentwise and correspond to adding/scaling transformation outputs.
Matrix multiplication is defined to match composition: (AB)x = A(Bx).
Shape rule for multiplication: (m×n)(n×p) = (m×p); inner dimensions must match.
Multiplication is generally not commutative: AB ≠ BA.
Transpose swaps rows and columns: (Aᵀ)ᵢⱼ = aⱼᵢ, and (AB)ᵀ = BᵀAᵀ.
Trying to add matrices of different shapes (e.g., 2×3 plus 3×2).
Assuming matrix multiplication is commutative because scalar multiplication is (it usually isn’t).
Forgetting the shape rule and multiplying in an invalid order (or expecting the wrong output size).
Thinking multiplication is entrywise (it’s row-by-column sums, designed for composition).
Let A = [[3, −1], [2, 4]] and B = [[0, 5], [−2, 1]]. Compute A + B and 3A − 2B.
Hint: Addition/subtraction and scalar multiplication are componentwise. Do 3A and 2B first, then subtract.
A + B = [[3+0, −1+5], [2+(−2), 4+1]] = [[3, 4], [0, 5]].
3A = [[9, −3], [6, 12]]
2B = [[0, 10], [−4, 2]]
3A − 2B = [[9−0, −3−10], [6−(−4), 12−2]] = [[9, −13], [10, 10]].
Let A be 2×3 and B be 3×4. (a) What is the shape of AB? (b) Is BA defined? If it is, what would its shape be?
Hint: Use (m×n)(n×p) = (m×p). For BA, the inner dimensions would be 4 and 2—do they match?
(a) AB has shape (2×3)(3×4) = 2×4.
(b) BA would require (3×4)(2×3), but the inner dimensions 4 and 2 do not match, so BA is not defined.
Given A = [[1, 2], [3, 4]] and B = [[2, 0], [1, −1]], compute (AB)ᵀ and BᵀAᵀ to verify (AB)ᵀ = BᵀAᵀ.
Hint: First compute AB (2×2). Then transpose. Separately compute Bᵀ and Aᵀ and multiply in that order.
AB = [[1,2],[3,4]] [[2,0],[1,−1]]
= [[1·2+2·1, 1·0+2·(−1)], [3·2+4·1, 3·0+4·(−1)]]
= [[4, −2], [10, −4]].
(AB)ᵀ = [[4, 10], [−2, −4]].
Bᵀ = [[2,1],[0,−1]], Aᵀ = [[1,3],[2,4]].
BᵀAᵀ = [[2,1],[0,−1]] [[1,3],[2,4]]
= [[2·1+1·2, 2·3+1·4], [0·1+(−1)·2, 0·3+(−1)·4]]
= [[4, 10], [−2, −4]].
They match, so (AB)ᵀ = BᵀAᵀ.
Next nodes you’re ready for: