LU, QR decomposition. Breaking matrices into factors.
Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.
Matrix decomposition is the idea that a “complicated” matrix can often be rewritten as a product of “structured” matrices (triangular, orthogonal, permutation). That one rewrite turns many hard tasks—solving systems, computing determinants, least squares, and inverses—into sequences of easy steps.
Matrix decomposition (factorization) rewrites A as a product like P A = L U or A = Q R. LU connects directly to Gaussian elimination for fast solves; QR uses orthogonality for numerical stability and least squares.
In linear algebra, many problems repeatedly reduce to operations with the same matrix A:
If you try to do each task “from scratch,” you typically re-run elimination steps or do expensive computations repeatedly. Matrix decomposition avoids that repetition by separating A into factors that are easier to work with.
A matrix decomposition (or factorization) is a representation
A = A₁ A₂ … Aₖ
where the factors Aᵢ have special structure (triangular, orthogonal, permutation, diagonal, etc.). The structure is the entire point: it turns expensive operations into cheap ones.
Two decompositions dominate introductory numerical linear algebra:
1) LU decomposition (with pivoting):
P A = L U
2) QR decomposition:
A = Q R
Triangular matrices make solving systems easy by substitution.
These cost about O(n²) operations, much cheaper than O(n³) elimination.
Orthogonal matrices preserve lengths and angles:
‖Q v‖ = ‖v‖, and Q⁻¹ = Qᵀ.
This is valuable for numerical stability: multiplying by Q doesn’t amplify errors the way arbitrary matrices can.
| Decomposition | Form | Typical use | Key strength | Key limitation |
|---|---|---|---|---|
| LU (with pivoting) | P A = L U | Many solves A x = b, determinant | Fast repeated solves | Needs pivoting for robustness; typically square A |
| QR | A = Q R | Least squares, stable solves | Numerically stable | Usually more work than LU for square solves |
The rest of this lesson builds these ideas slowly: first LU as “organized Gaussian elimination,” then QR as “orthogonal elimination.”
You already know Gaussian elimination: use row operations to turn A into an upper-triangular matrix, then solve by back substitution.
LU decomposition answers a natural question:
Can we save the elimination work so we can solve A x = b for many b without re-eliminating every time?
Yes. LU stores the elimination multipliers in L and the resulting triangular matrix in U.
Suppose elimination transforms A into U.
Each elimination step subtracts a multiple of a pivot row from rows below.
Those multiples become the subdiagonal entries of L.
The result is:
A = L U
where:
To solve A x = b:
A x = b
L U x = b
Let y = U x. Then:
1) L y = b (forward substitution)
2) U x = y (back substitution)
This is the payoff: after one O(n³) factorization, each new solve is ~O(n²).
In exact arithmetic, LU without pivoting can fail if a pivot is zero.
In floating-point arithmetic, it can also be disastrously inaccurate if a pivot is tiny.
Pivoting fixes this by swapping rows so that the pivot is “safe.” The most common form is partial pivoting.
We then write:
P A = L U
where P is a permutation matrix representing row swaps.
P is an identity matrix with rows permuted.
Multiplying by P on the left permutes rows:
(P A) has the same rows as A, just reordered.
In solving, it means you actually solve:
P A x = P b
L U x = P b
You already know det(A) = 0 iff A is singular. LU gives a fast route to det(A).
From P A = L U:
det(P A) = det(L U)
Use multiplicativity of determinant:
det(P) det(A) = det(L) det(U)
Now note:
So:
det(A) = det(P)⁻¹ det(L) det(U) = det(P) det(U)
because det(P)⁻¹ = det(P) (since det(P) = ±1).
If you have many right-hand sides b₁, b₂, …, LU is a huge win.
Elimination is applying a sequence of elementary lower-triangular matrices E₁, E₂, … such that:
Eₖ … E₂ E₁ A = U
Those Eᵢ encode row operations. The inverse of a product of lower-triangular matrices is lower-triangular, so:
A = (E₁⁻¹ E₂⁻¹ … Eₖ⁻¹) U
and that product becomes L.
You don’t usually compute Eᵢ explicitly; LU is the compact record of what elimination did.
LU shines for square systems A x = b.
But many real problems are rectangular:
So you solve a least squares problem:
minimize over x: ‖A x − b‖²
QR is designed for this. It also provides a numerically stable way to solve square systems.
For an m×n matrix A (typically m ≥ n), QR decomposition gives:
A = Q R
where:
The key property is orthogonality:
Qᵀ Q = I
and for thin QR:
Q has columns q₁, …, qₙ with qᵢᵀ qⱼ = 0 for i ≠ j and ‖qᵢ‖ = 1.
Orthogonal transformations preserve norms:
‖Q v‖ = ‖v‖
and they don’t magnify relative errors much. That makes QR a workhorse in numerical linear algebra.
Also, orthogonality lets you “remove” Q easily since Q⁻¹ = Qᵀ.
We want:
min x ‖A x − b‖²
Substitute A = Q R:
‖A x − b‖²
= ‖Q R x − b‖²
Insert I = Q Qᵀ (since Q is orthogonal):
= ‖Q R x − Q Qᵀ b‖²
= ‖Q (R x − Qᵀ b)‖²
Now use norm preservation:
= ‖R x − Qᵀ b‖²
So the least squares problem reduces to minimizing a norm involving R, which is triangular.
In the common case m ≥ n with thin QR, R is n×n upper-triangular, and we solve:
R x = Qᵀ b
by back substitution (or solve the top n equations if using full QR).
There are multiple algorithms; two important ones:
1) Gram–Schmidt (conceptual, but can be numerically fragile)
2) Householder reflections (standard in practice)
You don’t need to memorize the full Householder procedure yet, but you should understand the idea: QR uses orthogonal operations to make A triangular without the instability of naive elimination.
If A is square and nonsingular:
A x = b
Q R x = b
Multiply by Qᵀ:
R x = Qᵀ b
Then back substitution solves it.
Compared to LU, QR is often more stable (especially for ill-conditioned problems), but typically more expensive for square systems.
Matrix decomposition is not one tool but a toolbox. The decomposition you choose matches the structure of the task.
| Task | Best default | Why |
|---|---|---|
| Solve A x = b (square), many b | LU with pivoting | Fast repeated solves after factorization |
| Compute det(A) | LU with pivoting | det(A) = det(P) ∏ᵢ Uᵢᵢ |
| Least squares min ‖A x − b‖² | QR | Orthogonality gives stable reduction to triangular solve |
| Solve A x = b (square), high stability needed | QR | Orthogonal transforms are numerically gentle |
1) Efficient linear solves in optimization
2) Least squares in data fitting
3) Understanding singularity and rank behavior
In code libraries, you may see variants:
The purpose is the same: reorder rows/columns to find good pivots.
If you remember just that, you’ll choose correctly most of the time—and you’ll understand what the library function is doing when it returns L, U, P or Q, R.
Let A = [[2, 1], [4, 3]] and b = [1, 2]ᵀ. Find L and U such that A = L U, then solve A x = b.
We eliminate the (2,1) entry (the 4) using pivot 2.
Multiplier m = 4/2 = 2.
After elimination, row2 ← row2 − m·row1:
row2: [4, 3] − 2·[2, 1] = [4, 3] − [4, 2] = [0, 1].
So U = [[2, 1], [0, 1]].
Form L using the multiplier m below the diagonal and 1s on the diagonal:
L = [[1, 0], [m, 1]] = [[1, 0], [2, 1]].
Check A = L U:
L U = [[1, 0], [2, 1]] · [[2, 1], [0, 1]]
= [
[1·2 + 0·0, 1·1 + 0·1],
[2·2 + 1·0, 2·1 + 1·1]
]
= [[2, 1], [4, 3]] = A.
Solve A x = b via L U x = b.
Let y = U x. First solve L y = b (forward substitution):
[[1, 0], [2, 1]] · [y₁, y₂]ᵀ = [1, 2]ᵀ.
Equation 1: y₁ = 1.
Equation 2: 2y₁ + y₂ = 2 ⇒ 2·1 + y₂ = 2 ⇒ y₂ = 0.
Now solve U x = y (back substitution):
[[2, 1], [0, 1]] · [x₁, x₂]ᵀ = [1, 0]ᵀ.
Equation 2: x₂ = 0.
Equation 1: 2x₁ + x₂ = 1 ⇒ 2x₁ = 1 ⇒ x₁ = 1/2.
Insight: LU turns one elimination into a reusable factorization: each new b only needs forward + back substitution.
Let A = [[0, 1], [2, 3]]. Show that LU without pivoting fails, then find P, L, U such that P A = L U.
Attempt LU without pivoting: the first pivot would be A₁₁ = 0.
We cannot eliminate below using pivot 0 (division by zero), so LU (without pivoting) fails.
Use a row swap to bring a nonzero pivot to the top.
Swap row1 and row2. The permutation matrix is:
P = [[0, 1], [1, 0]].
Compute P A:
P A = [[0, 1], [1, 0]] · [[0, 1], [2, 3]]
= [[2, 3], [0, 1]].
Now P A is already upper-triangular, so we can take:
U = [[2, 3], [0, 1]].
Because no elimination was needed (no subdiagonal to eliminate), L is just the identity:
L = [[1, 0], [0, 1]].
Verify:
L U = I · U = U = P A.
Insight: Pivoting isn’t an optional detail: it makes LU exist (and behave well) by ensuring usable pivots.
Suppose A is m×n with m ≥ n and A = Q R (thin QR, so Q is m×n with orthonormal columns and R is n×n upper-triangular). Show how min ‖A x − b‖ reduces to a triangular system.
Start with the least squares objective:
min x ‖A x − b‖².
Substitute A = Q R:
‖A x − b‖² = ‖Q R x − b‖².
Use the fact that Q has orthonormal columns. Decompose b into components in the column space of Q and orthogonal to it:
b = Q(Qᵀ b) + (b − Q(Qᵀ b)).
Here Q(Qᵀ b) is the projection onto Col(Q) = Col(A).
Rewrite the residual:
Q R x − b
= Q R x − Q(Qᵀ b) − (b − Q(Qᵀ b))
= Q(R x − Qᵀ b) − (b − Q(Qᵀ b)).
Now take norms. The two terms are orthogonal (one lies in Col(Q), the other in its orthogonal complement), so by the Pythagorean theorem:
‖Q(R x − Qᵀ b) − (b − Q(Qᵀ b))‖²
= ‖Q(R x − Qᵀ b)‖² + ‖b − Q(Qᵀ b)‖².
Use norm preservation on the first term (since Q is orthonormal on its columns):
‖Q(R x − Qᵀ b)‖ = ‖R x − Qᵀ b‖.
The second term ‖b − Q(Qᵀ b)‖² does not depend on x, so minimizing the whole expression is equivalent to minimizing:
‖R x − Qᵀ b‖².
Thus the least squares minimizer x solves the triangular least squares problem, and when R is nonsingular (full column rank):
R x = Qᵀ b
which is solved by back substitution.
Insight: QR converts “fit a vector in a subspace” into a stable triangular solve, avoiding the error amplification of normal equations.
Matrix decomposition rewrites A as a product of structured factors to make downstream computations easier.
LU (with pivoting) factors as P A = L U, where L is lower-triangular (often unit diagonal) and U is upper-triangular.
Once LU is computed, solving A x = b becomes two cheap steps: L y = P b, then U x = y.
Pivoting (P) is essential for avoiding zero or tiny pivots and improving numerical robustness.
For triangular matrices, determinants are easy: det(U) = ∏ᵢ Uᵢᵢ, and with LU, det(A) = det(P) det(U) (if L is unit lower-triangular).
QR factors A = Q R with Q orthogonal/unitary and R upper-triangular; Qᵀ = Q⁻¹ makes it algebraically convenient.
QR is the standard stable method for least squares: min ‖A x − b‖ reduces to solving R x = Qᵀ b (when full rank).
Forgetting pivoting in LU and being surprised when a zero (or tiny) pivot breaks the factorization or causes large numerical error.
Mixing up where the permutation matrix goes: common conventions are P A = L U (row pivoting) or A = Pᵀ L U; libraries may return a pivot vector instead of P.
Assuming Q is always square m×m; in thin QR, Q is m×n with orthonormal columns, and R is n×n.
Using normal equations Aᵀ A x = Aᵀ b by default for least squares without realizing it can be less stable than QR (because it can worsen conditioning).
Compute an LU decomposition (no pivoting) of A = [[1, 2], [3, 8]] into A = L U with L unit lower-triangular. Then solve A x = b for b = [5, 15]ᵀ using the factors.
Hint: Eliminate the 3 under the pivot 1. The multiplier is m = 3/1. Put m into L₂₁, and the resulting upper-triangular matrix is U.
Eliminate: m = 3.
U comes from row2 ← row2 − 3·row1:
row2: [3, 8] − 3·[1, 2] = [0, 2].
So U = [[1, 2], [0, 2]].
L = [[1, 0], [3, 1]].
Solve L y = b:
y₁ = 5.
3y₁ + y₂ = 15 ⇒ 15 + y₂ = 15 ⇒ y₂ = 0.
Solve U x = y:
2x₂ = 0 ⇒ x₂ = 0.
x₁ + 2x₂ = 5 ⇒ x₁ = 5.
So x = [5, 0]ᵀ.
Let P A = L U be an LU decomposition with L unit lower-triangular. Suppose U has diagonal entries (2, −1, 5) and P corresponds to exactly one row swap. Compute det(A).
Hint: Use det(P) det(A) = det(L) det(U). For unit lower-triangular L, det(L) = 1. One row swap means det(P) = −1.
det(U) = 2 · (−1) · 5 = −10.
With one row swap, det(P) = −1.
From det(P) det(A) = det(L) det(U) = 1 · (−10) = −10,
(−1) det(A) = −10 ⇒ det(A) = 10.
Assume A = Q R is a thin QR decomposition with Qᵀ Q = I and R upper-triangular and invertible. Show that the unique minimizer of min ‖A x − b‖ is the solution to R x = Qᵀ b.
Hint: Start from ‖A x − b‖ = ‖Q R x − b‖. Add and subtract Q(Qᵀ b) and use orthogonality to split the norm into two parts, one depending on x and one not.
We minimize f(x) = ‖A x − b‖² = ‖Q R x − b‖².
Write b = Q(Qᵀ b) + (b − Q(Qᵀ b)).
Then:
Q R x − b = Q(R x − Qᵀ b) − (b − Q(Qᵀ b)).
The two terms are orthogonal, so:
‖Q(R x − Qᵀ b) − (b − Q(Qᵀ b))‖²
= ‖Q(R x − Qᵀ b)‖² + ‖b − Q(Qᵀ b)‖².
Because Q has orthonormal columns, ‖Q v‖ = ‖v‖ for v ∈ ℝⁿ, hence:
= ‖R x − Qᵀ b‖² + constant.
Therefore minimizing f(x) is equivalent to minimizing ‖R x − Qᵀ b‖².
Since R is invertible, the unique minimizer satisfies R x = Qᵀ b, giving x = R⁻¹ Qᵀ b.