One player's gain is another's loss. Minimax theorem.
Quick unlock - significant prerequisite investment but simple final step. Verify prerequisites first.
Many competitive situations feel like tug-of-war: every point you gain is a point your opponent loses. Zero-sum games are the mathematical model of that intuition—and the minimax theorem tells you exactly what “playing optimally” means when both sides are smart and adversarial.
A zero-sum game can be represented by a single payoff matrix A where Row receives Aᵢⱼ and Column receives −Aᵢⱼ. Players may use mixed strategies (probability distributions over actions). The minimax theorem guarantees the game has a value v and optimal strategies (p, q) such that maxₚ min_q pᵀAq = min_q maxₚ pᵀAq = v. Finding optimal strategies is equivalent to solving a linear program (and its dual).
Game theory has many types of interaction: cooperation, coordination, bargaining, and pure competition. Zero-sum games isolate the pure competition part.
In a zero-sum game, the players’ interests are perfectly opposed: whatever Row gains, Column loses by the same amount. That makes the analysis unusually clean: there is a single notion of “the payoff,” and strategic reasoning becomes a search for guarantees rather than compromise.
A finite two-player zero-sum game is specified by a payoff matrix A ∈ ℝ^(m×n).
So the game is “captured by one matrix.” If you know A, you know both players’ payoffs.
A pure strategy is a single action (choose i deterministically).
A mixed strategy is a probability distribution over actions:
We’ll write these sets as simplices:
If Row plays p and Column plays q, the expected payoff to Row is
E[payoff] = ∑ᵢ ∑ⱼ pᵢ Aᵢⱼ qⱼ = pᵀAq.
This bilinear form pᵀAq is the central object of zero-sum theory.
A key mindset shift is this:
So instead of maximizing expected value against a fixed distribution, you often maximize a worst-case expected value.
That leads directly to minimax reasoning.
If you play deterministically in an adversarial setting, you can become predictable—and predictability can be exploited.
Classic example: Rock–Paper–Scissors has no good pure strategy. Any pure move can be countered for a loss. Randomization is not “noise”; it is a strategic tool.
Fix Row’s mixed strategy p. Consider what happens against each pure column j. The expected payoff if Column plays pure j is
uⱼ(p) = ∑ᵢ pᵢ Aᵢⱼ.
This can be seen as the j-th component of the vector pᵀA.
Similarly, fix Column’s mixed strategy q. Against Row’s pure i, the expected payoff is
wᵢ(q) = ∑ⱼ Aᵢⱼ qⱼ,
i.e., the i-th component of Aq.
Now if both mix, the expected payoff is the average of these:
pᵀAq = ∑ⱼ qⱼ (∑ᵢ pᵢ Aᵢⱼ) = ∑ⱼ qⱼ uⱼ(p).
Given Column plays q, Row chooses p to maximize pᵀAq.
But for any fixed q, the function p ↦ pᵀAq is linear in p. Linear functions over a simplex achieve maxima at extreme points. That means:
Similarly, a best response to p for Column (who wants to minimize Row’s payoff) can be taken as a pure column j that minimizes (pᵀA)ⱼ.
So why do we need mixed strategies if best responses are pure? Because the equilibrium concept is not “best response to a fixed opponent strategy,” but “mutually stable.” The stability can require mixing so that the opponent becomes indifferent among multiple pure best responses.
A row i is strictly dominated by row k if
Aᵢⱼ < A_kⱼ for all j.
Then Row should never play i (it is worse regardless of Column’s choice). Similarly, Column can delete dominated columns.
This matters because it can reduce the game size before you do heavier computation (LP or solving equations).
Think of each pure row i as a vector of payoffs across columns: (Aᵢ1, …, Aᵢn). When Row mixes, p forms a convex combination of these vectors.
Row can choose a point in this convex set; Column then picks the coordinate (a column) that hurts Row most. Row’s problem becomes: pick a convex combination that maximizes the minimum coordinate.
That is already “minimax” in geometric clothing.
In many non-zero-sum games, equilibrium can mean multiple outcomes, negotiation, and mutual gain. In a zero-sum game, optimal play has a sharper promise:
If these guarantees meet, the meeting point is the value of the game.
Define Row’s guaranteed payoff (maximin):
v_maximin = max_{p∈Δ_m} min_{q∈Δ_n} pᵀAq.
Interpretation:
1) Row chooses p first.
2) Column, knowing p, chooses q to minimize Row’s payoff.
3) Row selects p to maximize the resulting worst-case.
Define Column’s guaranteed bound (minimax):
v_minimax = min_{q∈Δ_n} max_{p∈Δ_m} pᵀAq.
Interpretation:
1) Column chooses q first.
2) Row responds to maximize payoff.
3) Column chooses q to make that best-case as small as possible.
A general inequality always holds:
max_{p} min_{q} pᵀAq ≤ min_{q} max_{p} pᵀAq.
Why? Because for any fixed (p, q):
min_{q} pᵀAq ≤ pᵀAq ≤ max_{p} pᵀAq
and taking max on the left expression and min on the right preserves the inequality.
So Row’s guarantee can’t exceed Column’s guarantee. The miracle is that in finite zero-sum games, they are actually equal.
A saddle point is a strategy pair (p⋆, q⋆) such that
pᵀAq⋆ ≤ (p⋆)ᵀAq⋆ ≤ (p⋆)ᵀAq
for all p ∈ Δ_m and q ∈ Δ_n.
Interpretation:
In a zero-sum game, a saddle point is exactly a Nash equilibrium (for the two-player zero-sum setting) because one player’s utility is the negative of the other’s.
If the saddle point occurs at pure strategies (i⋆, j⋆), then A_{i⋆j⋆} is simultaneously:
This is sometimes called a pure saddle point.
For finite two-player zero-sum games:
max_{p∈Δ_m} min_{q∈Δ_n} pᵀAq = min_{q∈Δ_n} max_{p∈Δ_m} pᵀAq = v.
Moreover, there exist optimal mixed strategies (p⋆, q⋆) achieving this value v.
This theorem does three jobs at once:
1) Existence: an equilibrium in mixed strategies exists.
2) Equality of guarantees: the order of max and min can be swapped.
3) Operational meaning: v is the payoff under optimal play.
At equilibrium, any pure strategy used with positive probability must be a best response.
Concretely:
This gives a practical method for small games: set certain expected payoffs equal and solve.
The minimax theorem is not only about existence—it’s computational.
Row’s problem is:
maximize_{p∈Δ_m} min_{j} ∑ᵢ pᵢ Aᵢⱼ
because Column’s best response to p can be taken as a pure column j that minimizes (pᵀA)ⱼ.
That “maximize a minimum of linear functions” can be converted into an LP by introducing a variable v representing the guaranteed payoff.
Row wants the largest v such that every column j gives expected payoff ≥ v.
Variables: p ∈ ℝ^m, v ∈ ℝ
LP:
maximize v
subject to
This is linear because v appears linearly and constraints are linear inequalities.
Similarly, Column wants to minimize Row’s payoff. Column chooses q to make every row’s expected payoff ≤ v.
Variables: q ∈ ℝ^n, v ∈ ℝ
minimize v
subject to
The minimax theorem corresponds to strong duality: both LPs share the same optimal value v.
Some textbooks require A to be strictly positive to avoid sign issues when transforming to canonical forms. Practically, you can always shift payoffs by a constant c:
A' = A + c·1·1ᵀ
where 1 is the all-ones vector. Then
pᵀA'q = pᵀAq + c
So the optimal strategies (p⋆, q⋆) do not change, and the value shifts by +c.
This is a useful trick when converting to alternative LP formulations (like minimizing ∑ xᵢ subject to Aᵀx ≥ 1).
At optimality:
∑ᵢ pᵢ⋆ Aᵢⱼ = v.
∑ⱼ Aᵢⱼ qⱼ⋆ = v.
This matches the indifference principle: actions in the support of a mixed strategy yield equal expected payoff.
| Method | Best for | How it works | Tradeoffs |
|---|---|---|---|
| Indifference / solving equations | 2×2, small 3×3 | Make opponent indifferent across support | Requires guessing support; fragile at scale |
| Dominated strategy elimination | Preprocessing | Remove rows/cols that are never optimal | Not always applicable |
| Linear programming | General finite games | Solve Row’s LP or Column’s LP | Needs LP solver; numerical considerations |
| Specialized algorithms (e.g., simplex variants, subgradient) | Larger games | Exploit structure / sparsity | More advanced |
Since you already know LP, the key conceptual leap is: “optimal play” is an LP feasibility + optimization problem.
Zero-sum games formalize adversarial optimization: one side tries to minimize a quantity while the other tries to maximize it. That pattern appears in:
In a vanilla GAN, we have two models:
Training is often written as
min_G max_D V(D, G)
This is not always a strict zero-sum game in the clean matrix sense (continuous strategies, non-convex objectives, approximation, etc.), but the conceptual backbone is minimax: one player improves by making the other worse.
1) Value thinking: Instead of “did my loss go down,” ask “what can I guarantee against an improving opponent?”
2) Mixed strategies as distributions: In GANs, stochasticity and diversity in generation can be understood as mixing over outputs; in more theoretical treatments, one may consider mixed strategies over model parameters.
3) No-regret dynamics → equilibrium (preview idea): In many zero-sum settings, if both sides use algorithms with vanishing regret, the time-averaged strategies converge to minimax equilibrium. This is a bridge to modern ML training dynamics.
Still, the zero-sum framework gives you the correct mental model for adversarial objectives: equilibrium is about balance, not unilateral improvement.
Whenever you see “minimize a loss that depends on an adversary,” try to rewrite it as:
payoff = pᵀAq (finite case)
or more generally as an expectation under two distributions. Then ask:
That’s the zero-sum mindset you’ll reuse in GANs and robust ML.
Payoff matrix to Row:
A = [ [ 2, −1 ],
[ 0, 1 ] ]
Row chooses a row; Column chooses a column. Column’s payoff is the negative.
Let Row mix: p = (p, 1−p). Column will choose the column that minimizes Row’s expected payoff.
Compute expected payoff if Column plays column 1:
u₁(p) = p·2 + (1−p)·0
= 2p
Compute expected payoff if Column plays column 2:
u₂(p) = p·(−1) + (1−p)·1
= −p + 1 − p
= 1 − 2p
Row wants to maximize min(u₁(p), u₂(p)). The best p occurs where the two are equal (so Column is indifferent):
2p = 1 − 2p
4p = 1
p = 1/4
So Row’s optimal mix is p⋆ = (1/4, 3/4). The value is the common payoff:
v = 2p = 2·(1/4) = 1/2
(check: 1 − 2p = 1 − 1/2 = 1/2).
Now solve for Column’s mix q = (q, 1−q) by making Row indifferent between rows (since Row must be willing to randomize):
Expected payoff of Row choosing row 1 against q:
r₁(q) = 2q + (−1)(1−q)
= 2q − 1 + q
= 3q − 1
Expected payoff of Row choosing row 2 against q:
r₂(q) = 0·q + 1·(1−q)
= 1 − q
Indifference: r₁(q) = r₂(q)
3q − 1 = 1 − q
4q = 2
q = 1/2
Thus Column’s optimal mix is q⋆ = (1/2, 1/2). Verify value:
(p⋆)ᵀAq⋆ = 1/2.
Insight: In a 2×2 zero-sum game without a pure saddle point, optimal mixing makes the opponent indifferent across the actions they might choose. Equilibrium equalizes payoffs on the support.
Payoff matrix to Row:
A = [ [ 3, 1, 2 ],
[ 4, 0, −1 ],
[ 2, 2, 1 ] ]
Compute Row’s minimum payoff in each row (Row assumes Column will pick the worst column):
Row 1 min = min(3,1,2) = 1
Row 2 min = min(4,0,−1) = −1
Row 3 min = min(2,2,1) = 1
Row’s maximin over pure rows is max(1, −1, 1) = 1. Candidate rows: row 1 or row 3.
Compute Column’s maximum payoff (to Row) in each column (Column assumes Row will pick the best row):
Col 1 max = max(3,4,2) = 4
Col 2 max = max(1,0,2) = 2
Col 3 max = max(2,−1,1) = 2
Column’s minimax over pure columns is min(4,2,2) = 2.
Since maximin (1) ≠ minimax (2), there is no pure-strategy saddle point. But we can still check if any entry is simultaneously a row-min and column-max (saddle condition):
Row minima occur at (row 1, col 2) value 1 and (row 3, col 3) value 1 (also row 3 col 1/2 are 2,2 not minima).
Column maxima: col 2 has max 2 (at row 3), col 3 has max 2 (at row 1). No entry equals both a row-min and a column-max.
Conclusion: no pure saddle point → optimal play requires mixed strategies (minimax theorem guarantees existence).
Insight: Pure saddle points are rare. The quick diagnostic is: compare max over row-minima vs min over column-maxima. If they differ, mixing is necessary.
Use the game from Example 1:
A = [ [ 2, −1 ],
[ 0, 1 ] ]
We’ll build Row’s LP and read it as “choose p to guarantee v against every column.”
Let p = (p₁, p₂) with p₁ + p₂ = 1 and p₁, p₂ ≥ 0. Introduce value variable v.
For each column j, require expected payoff ≥ v.
Column 1 constraint:
p₁·2 + p₂·0 ≥ v
2p₁ ≥ v
Column 2 constraint:
p₁·(−1) + p₂·1 ≥ v
−p₁ + p₂ ≥ v
Add probability simplex constraint:
p₁ + p₂ = 1
p₁, p₂ ≥ 0
Optimization objective:
maximize v
Solve quickly by substituting p₂ = 1 − p₁:
Constraints become
2p₁ ≥ v
−p₁ + (1 − p₁) ≥ v ⇒ 1 − 2p₁ ≥ v
So v ≤ min(2p₁, 1 − 2p₁). Maximizing v sets them equal:
2p₁ = 1 − 2p₁ ⇒ p₁ = 1/4 ⇒ v = 1/2.
Thus the LP reproduces the minimax solution and makes the guarantee explicit: Row picks p₁ = 1/4 to ensure at least v = 1/2 no matter which column is chosen.
Insight: Each LP constraint corresponds to an adversarial scenario (a column choice). Maximizing v finds the strongest payoff you can guarantee uniformly across all scenarios.
A finite two-player zero-sum game is fully described by a single payoff matrix A; Column’s payoff is −Aᵢⱼ.
Mixed strategies are probability vectors p ∈ Δ_m and q ∈ Δ_n; the expected payoff is pᵀAq.
Row’s conservative objective is max_{p} min_{q} pᵀAq (maximin); Column’s is min_{q} max_{p} pᵀAq (minimax).
The minimax theorem guarantees equality of these quantities and the existence of optimal mixed strategies; the common value is v.
A saddle point (p⋆, q⋆) is a Nash equilibrium in zero-sum games and yields payoff v under optimal play.
At equilibrium, actions played with positive probability must yield equal expected payoff (indifference / complementary slackness).
Computing optimal strategies reduces to linear programming; strong duality mirrors minimax equality.
Minimax thinking generalizes to adversarial ML objectives (e.g., GANs), even when assumptions differ.
Confusing “randomizing because you’re unsure” with “randomizing to be unexploitable.” In zero-sum games, mixing is strategic, not merely epistemic.
Assuming Nash equilibrium always means pure strategies. Many zero-sum games (e.g., Rock–Paper–Scissors) have no pure equilibrium.
Forgetting that shifting all payoffs by a constant changes v but does not change optimal strategies.
Mixing up the directions of inequalities when writing LP constraints (Row uses ≥ v for each column; Column uses ≤ v for each row).
Solve the zero-sum game by indifference:
A = [ [ 1, −1 ],
[ −2, 2 ] ]
Find optimal mixed strategies (p⋆, q⋆) and the value v.
Hint: Let Row use p = (p, 1−p). Compute u₁(p), u₂(p) for the two columns and set them equal. Then make Row indifferent to solve for q.
Let p = (p, 1−p).
Column 1 payoff: u₁(p) = p·1 + (1−p)·(−2) = p − 2 + 2p = 3p − 2.
Column 2 payoff: u₂(p) = p·(−1) + (1−p)·2 = −p + 2 − 2p = 2 − 3p.
Equalize: 3p − 2 = 2 − 3p ⇒ 6p = 4 ⇒ p = 2/3.
Value: v = 3(2/3) − 2 = 2 − 2 = 0.
Now Column: q = (q, 1−q).
Row 1 payoff: r₁(q) = 1·q + (−1)(1−q) = q − 1 + q = 2q − 1.
Row 2 payoff: r₂(q) = (−2)q + 2(1−q) = −2q + 2 − 2q = 2 − 4q.
Equalize: 2q − 1 = 2 − 4q ⇒ 6q = 3 ⇒ q = 1/2.
So p⋆ = (2/3, 1/3), q⋆ = (1/2, 1/2), v = 0.
Formulate Row’s linear program (variables and constraints) for a general m×n payoff matrix A. Explain in one sentence what each family of constraints means.
Hint: Introduce v and require that every pure column action yields expected payoff ≥ v against p.
Variables: p ∈ ℝ^m, v ∈ ℝ.
maximize v
subject to:
1) For each column j = 1,…,n: ∑ᵢ pᵢ Aᵢⱼ ≥ v.
2) ∑ᵢ pᵢ = 1.
3) pᵢ ≥ 0 for all i.
Meaning: (1) enforces that Row’s mixed strategy p guarantees at least v even if Column chooses the worst column; (2)-(3) enforce that p is a valid probability distribution.
Consider the game
A = [ [ 0, 1, −1 ],
[ −1, 0, 1 ],
[ 1, −1, 0 ] ]
(a cyclic win/lose structure). Show that p = (1/3, 1/3, 1/3) and q = (1/3, 1/3, 1/3) is a saddle point and find v.
Hint: Compute pᵀA and Aq when p and q are uniform. Use symmetry to argue no player can do better by deviating.
Let u = (1/3, 1/3, 1/3).
Compute uᵀA. Each component is the average of a column:
So uᵀA = (0, 0, 0).
Thus for any q, uᵀAq = (0,0,0)·q = 0.
So if Row plays u, Row guarantees at least 0.
Similarly compute Au. Each component is the average of a row:
So Au = (0,0,0)ᵀ.
Thus for any p, pᵀAu = pᵀ(0,0,0)ᵀ = 0.
So if Column plays u, Column ensures Row’s payoff is at most 0.
Therefore max_{p} min_{q} pᵀAq = min_{q} max_{p} pᵀAq = 0, and (u, u) is a saddle point with value v = 0.
Next: Generative Adversarial Networks
Related prior knowledge: Nash Equilibrium, Linear Programming
Helpful follow-ons (if present in your tree): Duality in Linear Programming, No-Regret Learning, Convex–Concave Optimization