Matrix Operations

Linear AlgebraDifficulty: ██░░░Depth: 2Unlocks: 51

Addition, multiplication, transpose. Matrix as linear transformation.

Interactive Visualization

t=0s

Core Concepts

▸Matrix as a linear transformation: a rectangular array whose columns (or entries) specify a linear map from R^n to R^m
▸Matrix addition and scalar multiplication: componentwise addition and scalar scaling making matrices into a vector space
▸Matrix multiplication: binary operation producing a matrix that represents composing linear maps (defined by the row-by-column entry rule)
▸Transpose: operation that swaps rows and columns, with (A^T)_{ij} = a_{ji}

Key Symbols & Notation

a_ij (entry at row i, column j)

Essential Relationships

↔Matrix multiplication corresponds to composition of linear transformations: AB means apply B then A, with entries computed by summing products of corresponding row and column entries

Prerequisites (1)

Vectors Introduction6 atoms

Unlocks (6)

Systems of Linear Equationslvl 2

Determinantslvl 3

Game Theory Introductionlvl 3

Jacobianlvl 3

Matrix Calculuslvl 4

Markov Chainslvl 4

▶ Advanced Learning Details

Graph Position

Depth Cost

Fan-Out (ROI)

Bottleneck Score

Chain Length

Cognitive Load

Atomic Elements

Total Elements

Percentile Level

Atomic Level

All Concepts (19)

• Matrix: a rectangular array of numbers with m rows and n columns (entries denoted a_{ij})
• Matrix dimensions: the size m×n, row vs column, row count and column count
• Matrix entry notation a_{ij}: element in i-th row and j-th column
• Matrix equality: two matrices equal iff same dimensions and all corresponding entries equal
• Zero matrix 0_{m×n}: the matrix of all zeros for given dimensions
• Identity matrix I_n: the n×n matrix with 1s on the diagonal and 0s elsewhere
• Matrix addition: elementwise addition defined only for same-dimension matrices
• Scalar multiplication of a matrix: multiply each entry by a scalar
• Matrix transpose A^T: flip rows and columns (i.e., (A^T)_{ij}=A_{ji})
• Matrix multiplication (definition): product AB defined by row-by-column rule (or equivalently columns of AB are linear combinations of columns of A)
• Dimension compatibility for multiplication: AB defined only when A is m×n and B is n×p
• Resulting size of a product: an m×p matrix when multiplying m×n by n×p
• Matrix-vector product Ax: multiplying an m×n matrix by an n-vector to produce an m-vector
• Matrices as linear transformations: an m×n matrix represents a linear map from R^n to R^m
• Columns as images of basis vectors: the j-th column of A equals A applied to the j-th standard basis vector
• Kernel/nullspace of A: set of vectors x with Ax = 0
• Column space / image of A: span of the columns of A, i.e., the set of all Ax
• Associative and distributive properties of matrix multiplication with respect to addition (structural properties)
• Non-commutativity of matrix multiplication: AB generally differs from BA

Teaching Strategy

Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.

Matrices look like “just tables of numbers,” but their real power is that they do something: they represent linear transformations—rules that move, rotate, stretch, and mix vectors in a consistent way.

TL;DR:

A matrix is a linear map from ℝⁿ to ℝᵐ. You can add matrices and scale them entry-by-entry. You multiply matrices to compose linear maps (apply one, then the other). The transpose swaps rows and columns and often converts “row” viewpoints into “column” viewpoints.

What Is Matrix Operations? (And What Is a Matrix?)

Why we care

You already know vectors: you can add them and scale them, and those operations have geometric meaning (combine directions, change magnitude). Matrices extend this idea: they describe linear rules that take an input vector and produce an output vector.

If vectors are “arrows,” then matrices are “machines that transform arrows.” Understanding addition, multiplication, and transpose is mostly about understanding how these machines combine.

The object: a matrix

A matrix A is a rectangular array of numbers with m rows and n columns. We write A ∈ ℝᵐˣⁿ.

•Entry notation: aᵢⱼ means “row i, column j.”
•Shape matters: a 2×3 matrix is a different kind of object than a 3×2 matrix.

Example:

A = [ [1, 2, 0],

[−1, 3, 4] ] is 2×3

Matrix as a linear transformation (the main intuition)

A matrix A ∈ ℝᵐˣⁿ represents a linear map

A: ℝⁿ → ℝᵐ

Given an input vector x ∈ ℝⁿ, the output is y = Ax ∈ ℝᵐ.

The most important way to “see” A is via its columns.

Let A have columns a₁, a₂, …, aₙ, where each aⱼ ∈ ℝᵐ. Write

A = [ a₁ a₂ … aₙ ]

If x = [x₁, x₂, …, xₙ]ᵀ, then

Ax = x₁a₁ + x₂a₂ + … + xₙaₙ

That is: Ax is a linear combination of the columns of A, using the entries of x as weights.

Linearity (what makes matrices special)

A transformation T is linear if for all vectors u, v and scalars c:

T(u + v) = T(u) + T(v)

T(cu) = cT(u)

Matrix multiplication Ax always defines a linear transformation, and (in finite-dimensional real vector spaces) every linear transformation can be represented by some matrix.

This node is about operations that correspond to natural ways of combining these linear machines:

•Addition / scalar multiplication: combine machines “in parallel” (add their outputs).
•Multiplication: combine machines “in series” (compose transformations).
•Transpose: flip the matrix, swapping the roles of inputs/outputs in many identities.

Core Mechanic 1: Matrix Addition and Scalar Multiplication

Why these operations exist

When you add vectors, you get another vector that represents the combined effect. For matrices, addition and scaling let you build new linear transformations from old ones.

If A and B both take ℝⁿ → ℝᵐ, then you can define a new transformation:

(A + B)(x) = Ax + Bx

This is “add the outputs” for the same input.

Definition: addition (componentwise)

You may add matrices only when they have the same shape.

If A, B ∈ ℝᵐˣⁿ, then C = A + B ∈ ℝᵐˣⁿ is defined by

cᵢⱼ = aᵢⱼ + bᵢⱼ

So it’s entry-by-entry.

Definition: scalar multiplication

For scalar c ∈ ℝ and matrix A ∈ ℝᵐˣⁿ:

(cA)ᵢⱼ = c·aᵢⱼ

Again, entry-by-entry.

Geometric / transformation viewpoint

If A and B represent linear maps ℝⁿ → ℝᵐ, then:

•A + B is the map that sends x to Ax + Bx.
•cA is the map that sends x to c(Ax) (it scales every output by c).

A useful identity to connect “entry rules” to “transformation rules”:

(A + B)x = Ax + Bx

Proof (by columns; a good mental model):

Let A = [a₁ … aₙ], B = [b₁ … bₙ]. Then A + B = [a₁+b₁ … aₙ+bₙ].

So for x = [x₁,…,xₙ]ᵀ:

(A+B)x

= x₁(a₁+b₁) + … + xₙ(aₙ+bₙ)

= (x₁a₁ + … + xₙaₙ) + (x₁b₁ + … + xₙbₙ)

= Ax + Bx

Matrices form a vector space

Because addition and scalar multiplication behave like vector operations, the set ℝᵐˣⁿ is a vector space.

Here are the most used properties (they mirror vector properties):

•Commutativity: A + B = B + A
•Associativity: (A + B) + C = A + (B + C)
•Distributivity: c(A + B) = cA + cB
•Zero matrix 0: A + 0 = A
•Additive inverse: A + (−A) = 0

Practical shape checklist

Before adding or scaling, check:

•A + B: requires same m and n.
•cA: always OK; shape unchanged.

You’ll rely on this habit when you start solving linear systems and doing matrix calculus.

Core Mechanic 2: Matrix Multiplication (Composition of Linear Maps)

Why multiplication is defined the way it is

Matrix multiplication is not “entry-by-entry.” It is designed so that multiplying matrices corresponds to composing linear transformations.

If B maps ℝᵖ → ℝⁿ and A maps ℝⁿ → ℝᵐ, then doing B first and then A is a map

ℝᵖ → ℝᵐ

Matrix multiplication encodes exactly that:

(AB)x = A(Bx)

So the order matters: AB means “apply B, then apply A.”

Shape rule (the first gate)

If A ∈ ℝᵐˣⁿ and B ∈ ℝⁿˣᵖ, then the product AB is defined and has shape ℝᵐˣᵖ.

The “inner” dimensions must match (n).

A quick mnemonic:

(m×n)·(n×p) = (m×p)

Entry rule (row-by-column)

For C = AB, the entry cᵢⱼ is

cᵢⱼ = ∑ₖ₌₁ⁿ aᵢₖ bₖⱼ

Interpretation: take row i of A and column j of B, multiply matching entries, and sum.

Same rule, but with vectors (often clearer)

Let bⱼ be the j-th column of B. Then the j-th column of C = AB is

(AB)ⱼ = Abⱼ

So multiplying on the left by A means: apply A to each column of B.

This is a powerful viewpoint:

•Columns of B are inputs.
•A transforms each input into an output.
•The outputs become columns of AB.

Multiplication is not commutative

In general:

AB ≠ BA

Because composition of functions is not commutative.

Even when both products exist (both are square of same size), they can differ.

Associativity (composition works in chains)

If shapes match so everything is defined:

(AB)C = A(BC)

This matters because it lets you write ABC without parentheses (the product is unambiguous).

Distributivity over addition

If shapes match:

A(B + C) = AB + AC

(A + B)C = AC + BC

Identity matrix (the “do nothing” transformation)

The identity matrix Iₙ ∈ ℝⁿˣⁿ satisfies

Iₙ x = x for all x ∈ ℝⁿ

And for any compatible matrix A:

Iₘ A = A, AIₙ = A

A small transformation picture (ℝ² → ℝ²)

Suppose A scales x by 2 and y by 1:

A = [ [2, 0],

[0, 1] ]

Suppose B swaps x and y:

B = [ [0, 1],

[1, 0] ]

Then AB means “swap, then scale.” BA means “scale, then swap.” Those are different operations, so the matrices differ.

Summary table: addition vs multiplication

Operation	When defined?	Rule	Transformation meaning
A + B	same shape m×n	(aᵢⱼ + bᵢⱼ)	add outputs: (A+B)x = Ax + Bx
cA	always	(c·aᵢⱼ)	scale outputs: (cA)x = c(Ax)
AB	(m×n)(n×p)	cᵢⱼ = ∑ aᵢₖbₖⱼ	compose maps: (AB)x = A(Bx)

Core Mechanic 3: Transpose (Swap Rows and Columns)

Why transpose matters

Transpose looks simple—flip across the diagonal—but it appears everywhere:

•It converts between row and column perspectives.
•It is essential for dot products in matrix form (e.g., xᵀy).
•It’s used constantly in optimization, least squares, and machine learning.

Definition

If A ∈ ℝᵐˣⁿ, then Aᵀ ∈ ℝⁿˣᵐ and

(Aᵀ)ᵢⱼ = aⱼᵢ

So rows become columns.

Example:

A = [ [1, 2, 3],

[4, 5, 6] ] (2×3)

Aᵀ = [ [1, 4],

[2, 5],

[3, 6] ] (3×2)

Key identities (worth memorizing)

These are not just algebra tricks; they encode how “flipping” interacts with other operations.

1) Double transpose returns the original:

(Aᵀ)ᵀ = A

2) Transpose distributes over addition and scalar multiplication:

(A + B)ᵀ = Aᵀ + Bᵀ

(cA)ᵀ = cAᵀ

3) Transpose reverses multiplication order:

(AB)ᵀ = Bᵀ Aᵀ

This one is extremely important.

Showing the work for (AB)ᵀ = BᵀAᵀ

Let A ∈ ℝᵐˣⁿ, B ∈ ℝⁿˣᵖ.

Consider entry (i, j) of (AB)ᵀ.

((AB)ᵀ)ᵢⱼ

= (AB)ⱼᵢ

= ∑ₖ₌₁ⁿ aⱼₖ bₖᵢ

Now look at (BᵀAᵀ)ᵢⱼ. Here Bᵀ ∈ ℝᵖˣⁿ and Aᵀ ∈ ℝⁿˣᵐ, so the product is ℝᵖˣᵐ, matching.

(BᵀAᵀ)ᵢⱼ

= ∑ₖ₌₁ⁿ (bᵀ)ᵢₖ (aᵀ)ₖⱼ

= ∑ₖ₌₁ⁿ bₖᵢ aⱼₖ

= ∑ₖ₌₁ⁿ aⱼₖ bₖᵢ

So the entries match for all i, j ⇒ the matrices are equal.

Transpose and vectors

For a column vector x ∈ ℝⁿ, xᵀ is a 1×n row vector.

Dot product in matrix notation:

xᵀy = ∑ᵢ xᵢ yᵢ

This is a bridge between “vector algebra” and “matrix algebra.”

Application/Connection: Matrices as Tools for Systems, Geometry, and Data

1) Systems of linear equations

A system like

2x + y = 5

−x + 3y = 1

can be written as Ax = b with

A = [ [2, 1],

[−1, 3] ], x = [x, y]ᵀ, b = [5, 1]ᵀ

Then solving the system means finding x such that Ax matches b.

This node prepares you for Gaussian elimination and matrix methods in Systems of Linear Equations.

2) Geometry: transforming space

In ℝ², a matrix can represent:

•scaling (stretch/compress)
•reflection
•shear
•rotation (with special matrices)

The column view is especially geometric: the columns of A are where the basis vectors go.

If e₁ = [1,0]ᵀ and e₂ = [0,1]ᵀ, then

Ae₁ = first column of A

Ae₂ = second column of A

So A is fully determined by where it sends the basis.

3) Composition = pipelines

If B does “normalize then mix features” and A does “project to a smaller space,” then AB is the combined pipeline.

This viewpoint shows up in ML models and in the Jacobian, where derivatives themselves form matrices and are multiplied via chain rule (composition).

4) Transpose in data problems

If a data matrix X stores examples in rows (common) versus columns (also common), transpose switches conventions.

•Rows-as-examples: X ∈ ℝᴺˣᵈ (N examples, d features)
•Columns-as-examples: Xᵀ ∈ ℝᵈˣᴺ

Transpose lets formulas remain consistent even when you change how you store data.

5) Where this node leads

Once you’re fluent in these operations, many “next” ideas become more natural:

•Determinants: understand singular vs invertible transformations.
•Matrix Calculus: gradients/Jacobians use transpose constantly.
•Game Theory Introduction: payoff matrices and mixed strategies rely on matrix-vector multiplication.

Matrices are not just notation; they are a compact language for linear structure.

Worked Examples (3)

Add and scale matrices (componentwise) and interpret as transformations

Let A = [[1, −2], [0, 3]] and B = [[4, 1], [−1, 2]]. Compute C = 2A + B. Then verify on a vector x = [1, 2]ᵀ that Cx = 2(Ax) + Bx.

Compute 2A entrywise:
2A = [[2·1, 2·(−2)], [2·0, 2·3]] = [[2, −4], [0, 6]].
Add B entrywise:
C = 2A + B
= [[2+4, −4+1], [0+(−1), 6+2]]
= [[6, −3], [−1, 8]].
Compute Ax:
Ax = [[1, −2], [0, 3]] [1, 2]ᵀ
= [1·1 + (−2)·2, 0·1 + 3·2]ᵀ
= [1 − 4, 6]ᵀ
= [−3, 6]ᵀ.
Compute 2(Ax) = 2[−3, 6]ᵀ = [−6, 12]ᵀ.
Compute Bx:
Bx = [[4, 1], [−1, 2]] [1, 2]ᵀ
= [4·1 + 1·2, (−1)·1 + 2·2]ᵀ
= [6, 3]ᵀ.
Add outputs:
2(Ax) + Bx = [−6, 12]ᵀ + [6, 3]ᵀ = [0, 15]ᵀ.
Compute Cx directly:
Cx = [[6, −3], [−1, 8]] [1, 2]ᵀ
= [6·1 + (−3)·2, (−1)·1 + 8·2]ᵀ
= [6 − 6, −1 + 16]ᵀ
= [0, 15]ᵀ.
Conclude Cx = 2(Ax) + Bx, matching the transformation interpretation.

Insight: Matrix addition/scaling are designed so that adding/scaling matrices corresponds exactly to adding/scaling the outputs of the linear transformations they represent.

Matrix multiplication as composition (and why order matters)

Let A = [[2, 0], [0, 1]] (scale x by 2) and B = [[0, 1], [1, 0]] (swap coordinates). Compute AB and BA, and test on x = [3, 5]ᵀ.

Compute AB using row-by-column:
AB = [[2, 0], [0, 1]] [[0, 1], [1, 0]]
First row vs columns:
c₁₁ = 2·0 + 0·1 = 0
c₁₂ = 2·1 + 0·0 = 2
Second row vs columns:
c₂₁ = 0·0 + 1·1 = 1
c₂₂ = 0·1 + 1·0 = 0
So AB = [[0, 2], [1, 0]].
Compute BA:
BA = [[0, 1], [1, 0]] [[2, 0], [0, 1]]
First row:
d₁₁ = 0·2 + 1·0 = 0
d₁₂ = 0·0 + 1·1 = 1
Second row:
d₂₁ = 1·2 + 0·0 = 2
d₂₂ = 1·0 + 0·1 = 0
So BA = [[0, 1], [2, 0]].
Observe AB ≠ BA.
Test on x = [3, 5]ᵀ:
Bx = [5, 3]ᵀ (swap)
A(Bx) = A[5, 3]ᵀ = [10, 3]ᵀ
So (AB)x = [10, 3]ᵀ.
Other order:
Ax = [6, 5]ᵀ (scale x)
B(Ax) = B[6, 5]ᵀ = [5, 6]ᵀ
So (BA)x = [5, 6]ᵀ.

Insight: Matrix multiplication encodes composition: AB means “do B first, then A.” Different order ⇒ different transformation.

Transpose properties, including reversing multiplication order

Let A = [[1, 2, 0], [−1, 3, 4]] (2×3) and B = [[2, 1], [0, −1], [5, 2]] (3×2). Compute (AB)ᵀ and compare with BᵀAᵀ.

Check shapes:
A is 2×3 and B is 3×2, so AB is 2×2 and transpose will be 2×2.
Compute AB:
AB = [[1, 2, 0], [−1, 3, 4]] [[2, 1], [0, −1], [5, 2]]
Entry (1,1): 1·2 + 2·0 + 0·5 = 2
Entry (1,2): 1·1 + 2·(−1) + 0·2 = 1 − 2 = −1
Entry (2,1): (−1)·2 + 3·0 + 4·5 = −2 + 20 = 18
Entry (2,2): (−1)·1 + 3·(−1) + 4·2 = −1 − 3 + 8 = 4
So AB = [[2, −1], [18, 4]].
Transpose it:
(AB)ᵀ = [[2, 18], [−1, 4]].
Compute Bᵀ and Aᵀ:
Bᵀ = [[2, 0, 5], [1, −1, 2]] (2×3)
Aᵀ = [[1, −1], [2, 3], [0, 4]] (3×2).
Multiply BᵀAᵀ (2×3)(3×2) = 2×2:
Entry (1,1): [2,0,5]·[1,2,0]ᵀ = 2·1 + 0·2 + 5·0 = 2
Entry (1,2): [2,0,5]·[−1,3,4]ᵀ = 2·(−1) + 0·3 + 5·4 = −2 + 20 = 18
Entry (2,1): [1,−1,2]·[1,2,0]ᵀ = 1·1 + (−1)·2 + 2·0 = −1
Entry (2,2): [1,−1,2]·[−1,3,4]ᵀ = 1·(−1) + (−1)·3 + 2·4 = −1 − 3 + 8 = 4
So BᵀAᵀ = [[2, 18], [−1, 4]].
Compare:
(AB)ᵀ = BᵀAᵀ.

Insight: Transpose flips multiplication order because it swaps the roles of rows and columns; the entrywise sum ∑ aᵢₖbₖⱼ becomes the same sum but viewed from the other side.

Key Takeaways

✓
A matrix A ∈ ℝᵐˣⁿ represents a linear transformation ℝⁿ → ℝᵐ via x ↦ Ax.
✓
Column view: Ax = x₁a₁ + … + xₙaₙ (a linear combination of columns of A).
✓
Matrix addition and scalar multiplication are componentwise and correspond to adding/scaling transformation outputs.
✓
Matrix multiplication is defined to match composition: (AB)x = A(Bx).
✓
Shape rule for multiplication: (m×n)(n×p) = (m×p); inner dimensions must match.
✓
Multiplication is generally not commutative: AB ≠ BA.
✓
Transpose swaps rows and columns: (Aᵀ)ᵢⱼ = aⱼᵢ, and (AB)ᵀ = BᵀAᵀ.

Common Mistakes

✗
Trying to add matrices of different shapes (e.g., 2×3 plus 3×2).
✗
Assuming matrix multiplication is commutative because scalar multiplication is (it usually isn’t).
✗
Forgetting the shape rule and multiplying in an invalid order (or expecting the wrong output size).
✗
Thinking multiplication is entrywise (it’s row-by-column sums, designed for composition).

Practice

easy

Let A = [[3, −1], [2, 4]] and B = [[0, 5], [−2, 1]]. Compute A + B and 3A − 2B.

Hint: Addition/subtraction and scalar multiplication are componentwise. Do 3A and 2B first, then subtract.

Show solution

A + B = [[3+0, −1+5], [2+(−2), 4+1]] = [[3, 4], [0, 5]].

3A = [[9, −3], [6, 12]]

2B = [[0, 10], [−4, 2]]

3A − 2B = [[9−0, −3−10], [6−(−4), 12−2]] = [[9, −13], [10, 10]].

medium

Let A be 2×3 and B be 3×4. (a) What is the shape of AB? (b) Is BA defined? If it is, what would its shape be?

Hint: Use (m×n)(n×p) = (m×p). For BA, the inner dimensions would be 4 and 2—do they match?

Show solution

(a) AB has shape (2×3)(3×4) = 2×4.

(b) BA would require (3×4)(2×3), but the inner dimensions 4 and 2 do not match, so BA is not defined.

hard

Given A = [[1, 2], [3, 4]] and B = [[2, 0], [1, −1]], compute (AB)ᵀ and BᵀAᵀ to verify (AB)ᵀ = BᵀAᵀ.

Hint: First compute AB (2×2). Then transpose. Separately compute Bᵀ and Aᵀ and multiply in that order.

Show solution

AB = [[1,2],[3,4]] [[2,0],[1,−1]]

= [[1·2+2·1, 1·0+2·(−1)], [3·2+4·1, 3·0+4·(−1)]]

= [[4, −2], [10, −4]].

(AB)ᵀ = [[4, 10], [−2, −4]].

Bᵀ = [[2,1],[0,−1]], Aᵀ = [[1,3],[2,4]].

BᵀAᵀ = [[2,1],[0,−1]] [[1,3],[2,4]]

= [[2·1+1·2, 2·3+1·4], [0·1+(−1)·2, 0·3+(−1)·4]]

= [[4, 10], [−2, −4]].

They match, so (AB)ᵀ = BᵀAᵀ.

Connections

Next nodes you’re ready for:

•Systems of Linear Equations: rewrite systems as Ax = b and solve.
•Determinants: understand when a square transformation is invertible (det ≠ 0).
•Jacobian: matrices of partial derivatives; chain rule becomes matrix multiplication.
•Matrix Calculus: gradients, Jacobians, Hessians; transpose appears constantly.
•Game Theory Introduction: payoff matrices and expected utilities via matrix-vector products.

Quality: B (4.2/5)

← back to tree browse all →