Strategic interaction between rational agents. Payoff matrices.
Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.
When two “rational” decision-makers interact, your best move depends on what you believe the other will do—and their best move depends on what they believe you will do. Game theory is the math of that interdependence.
Game theory models strategic interaction using players, strategies, and payoffs uᵢ(s). In normal-form (payoff-matrix) games, each player chooses an action (pure strategy) or a probability distribution over actions (mixed strategy). A Nash equilibrium is a strategy profile s where no player can unilaterally change their strategy and increase their payoff.
In many decisions, the outcome depends not just on your choice, but on others’ choices. If you’re choosing a price, an ad budget, a route in traffic, a security policy, or a bid in an auction, you’re not optimizing against nature—you’re optimizing against other optimizers.
This creates a loop:
Game theory provides a language and set of solution ideas for these strategic loops.
A (non-cooperative) game in this node will be described by:
1) Players: indexed by i ∈ {1, …, n}.
2) Strategies: what each player can choose.
We’ll denote player i’s strategy by sᵢ, and the full strategy profile by
s = (s₁, s₂, …, sₙ).
3) Payoffs (utilities): a number for each player given everyone’s chosen strategies.
uᵢ(s) = payoff to player i when the strategy profile is s.
In many introductory settings, each player chooses simultaneously (or without observing the others’ choices). For two-player games with finite actions, we can represent payoffs in a matrix.
Example format (2 players):
A payoff matrix is not merely “a table of outcomes.” It encodes a dependency structure: each player’s outcome changes with the other’s choice.
This node uses a simple, standard stance:
We are not assuming:
We are assuming the decision rule: “choose what maximizes (expected) payoff,” and we’ll see how that can still produce surprising outcomes.
In single-agent optimization, you often search for an action a that maximizes f(a).
In game theory, you search for a stable joint choice s that makes sense given that each player is optimizing:
s is reasonable if each sᵢ is a best response to s₋ᵢ (everyone else’s strategies).
That stability concept is the Nash equilibrium you’ll unlock later in more depth, but we’ll already define and use it in this intro.
Suppose you play Rock–Paper–Scissors. If you always play Rock, an opponent who learns this can exploit you by always playing Paper.
So sometimes the best “strategy” is not a single action but a randomized plan: choose Rock with probability 1/3, Paper with 1/3, Scissors with 1/3.
Randomization is not (only) about being unpredictable—it can be required for optimality when no pure action is stable.
A pure strategy picks one action with probability 1.
If Player i has action set Aᵢ = {aᵢ¹, …, aᵢᵏ}, then a pure strategy is an element aᵢ ∈ Aᵢ.
We often blur language and say “strategy” when we mean “action” in simple matrix games.
A mixed strategy is a probability distribution over Aᵢ.
Write Player i’s mixed strategy as a vector pᵢ where
Example: if Aᵢ = {Up, Down}, then pᵢ = (p, 1 − p).
Even though we write vectors, remember: these are probabilities, so the constraints matter.
Because you already know expected value, the key bridge is:
If outcomes are uncertain due to mixed strategies, each player maximizes expected utility.
For two players with finite actions:
Let U₁ be Player 1’s payoff matrix (numeric entries), where (U₁)ᵣc = u₁(row r, col c).
Then the expected payoff to Player 1 is
E[u₁] = pᵀ U₁ q.
This is exactly the “bilinear form” you may recognize from matrix operations.
Let Player 1 choose row r with probability pᵣ and Player 2 choose column c with probability q_c.
The joint probability of (r, c) under independent mixing is pᵣ q_c.
So
E[u₁] = ∑ᵣ ∑_c (pᵣ q_c) u₁(r, c)
Now observe that this is precisely the matrix expression pᵀ U₁ q.
A payoff matrix lists outcomes (u₁, u₂) per action pair.
But strategically, you should read it as:
That leads to the idea of a best response.
Given the other players’ strategies s₋ᵢ, player i’s best responses are the strategies that maximize uᵢ(sᵢ, s₋ᵢ).
If player i has two possible actions, you can often compute a threshold probability at which the player is indifferent between them.
That indifference condition is the workhorse for solving many 2×2 mixed-strategy equilibria later.
Suppose Player 1 has actions Top/Bottom and Player 2 has actions Left/Right.
Let Player 1’s payoff matrix be
U₁ = [ [2, 0],
[1, 3] ].
If Player 1 plays Top with probability p, then p = (p, 1 − p).
If Player 2 plays Left with probability q, then q = (q, 1 − q).
Compute E[u₁] = pᵀ U₁ q:
First compute U₁ q:
1q + 3(1 − q) ]
= [ 2q,
q + 3 − 3q ]
= [ 2q,
3 − 2q ].
Then multiply by pᵀ:
E[u₁] = (p, 1 − p) · (2q, 3 − 2q)
= p(2q) + (1 − p)(3 − 2q)
= 2pq + 3 − 2q − 3p + 2pq
= 3 − 2q − 3p + 4pq.
This is a concrete example of how strategies-as-probabilities translate into expected payoffs.
If each player is rational and chooses best responses, we need a way to describe outcomes that are consistent with that mutual best-response logic.
A naive idea is: “pick the outcome that maximizes total payoff.” But players are not a single coordinated agent, and they may not be able to commit to cooperate.
So game theory often focuses on stability:
If yes, the announced profile is unstable.
A strategy profile s = (s₁, …, sₙ*) is a Nash equilibrium if for every player i,
uᵢ(sᵢ, s₋ᵢ) ≥ uᵢ(sᵢ, s₋ᵢ*) for all feasible sᵢ.
Interpretation:
This is a local optimality condition in the space of strategy profiles.
Equivalently, define the best response set:
BRᵢ(s₋ᵢ) = { sᵢ : uᵢ(sᵢ, s₋ᵢ) is maximized }.
Then s* is a Nash equilibrium iff
sᵢ ∈ BRᵢ(s₋ᵢ) for all i.
This form is extremely practical for matrix games: you can mark best responses and look for intersections.
For a 2-player matrix:
This can be done visually.
Some games (e.g., Matching Pennies, Rock–Paper–Scissors) have no pure Nash equilibrium.
In those cases, a mixed Nash equilibrium may exist. In a mixed equilibrium:
Indifference is key because:
Nash equilibrium blocks profitable single-player deviations.
It does not block profitable joint deviations. Two players might both prefer to coordinate on a different outcome but can’t trust each other to stick to it without some commitment mechanism.
This distinction sets up later topics:
A Nash equilibrium can be inefficient.
To name the tension:
These are different axes, and game theory studies the gap.
| Concept | Question it answers | Depends on beliefs? | Typical output |
|---|---|---|---|
| Best response | “Given others, what should I do?” | Yes (others’ strategies fixed) | Set of strategies |
| Nash equilibrium | “What joint behavior is self-consistent?” | Implicitly (each expects others to play equilibrium) | Strategy profile s* |
| Social optimum | “What maximizes total welfare ∑ᵢ uᵢ?” | No | Outcome/profile |
Keep this table in mind: confusion between Nash equilibrium and social optimum is one of the most common early mistakes.
Intro game theory often uses a few “named” games because each highlights a different phenomenon.
Two players choose Cooperate (C) or Defect (D). A typical payoff matrix is:
| C | D | |
|---|---|---|
| C | (3, 3) | (0, 5) |
| D | (5, 0) | (1, 1) |
Lesson: rationality + no commitment can yield collectively worse outcomes.
Example:
| Left | Right | |
|---|---|---|
| Up | (2, 2) | (0, 0) |
| Down | (0, 0) | (2, 2) |
There are two pure Nash equilibria: (Up, Left) and (Down, Right).
Lesson: equilibrium selection matters; history, conventions, or communication can pick one.
Example (Player 1 wants match; Player 2 wants mismatch):
| H | T | |
|---|---|---|
| H | (1, −1) | (−1, 1) |
| T | (−1, 1) | (1, −1) |
There is no pure Nash equilibrium. The mixed equilibrium is both players choosing H with probability 1/2.
Lesson: randomization can be essential.
When you face a strategic scenario, a useful disciplined workflow is:
1) List players: Who makes decisions?
2) List actions (strategy sets Aᵢ): What can each player do?
3) Define payoffs uᵢ(s): What does each player value? (profit, time, safety, etc.)
4) Represent the game:
5) Analyze best responses:
6) Interpret: Is the equilibrium efficient? If not, what mechanism could change incentives?
A strategy is strictly dominated if there is another strategy that yields a strictly higher payoff no matter what the others do.
If sᵢ is strictly dominated by sᵢ′, then for all s₋ᵢ:
uᵢ(sᵢ′, s₋ᵢ) > uᵢ(sᵢ, s₋ᵢ).
Strictly dominated strategies should never be played by a rational payoff-maximizer.
This can simplify payoff matrices before searching for equilibrium.
With mixed strategies, expected payoff becomes a bilinear expression:
E[u₁] = pᵀ U₁ q.
So many computations reduce to:
That’s why your prerequisites (expected value, matrix operations) are exactly the right foundation.
This node gives you the language and baseline equilibrium concept.
Next nodes expand in two directions:
Keep in mind the central contrast:
Consider the 2-player game with actions {A, B} for Player 1 (rows) and {X, Y} for Player 2 (columns). Payoffs are (u₁, u₂):
| X | Y | |
|---|---|---|
| A | (2, 1) | (0, 0) |
| B | (1, 0) | (1, 2) |
Find all pure-strategy Nash equilibria.
Step 1: Compute Player 1’s best responses (maximize u₁) column by column.
Best response is A.
Best response is B.
Step 2: Compute Player 2’s best responses (maximize u₂) row by row.
Best response is X.
Best response is Y.
Step 3: Find intersections where both are best-responding.
Insight: Pure Nash equilibria are the cells where best-response choices intersect. This game has two equilibria, illustrating equilibrium multiplicity in coordination-like situations.
Matching Pennies payoff matrix (u₁, u₂):
| H | T | |
|---|---|---|
| H | (1, −1) | (−1, 1) |
| T | (−1, 1) | (1, −1) |
Let Player 1 play H with probability p, and Player 2 play H with probability q. Find the mixed Nash equilibrium.
Step 1: Write Player 1’s expected payoff from choosing a pure action, given Player 2 mixes with probability q on H.
If Player 1 plays H:
E[u₁ | H] = 1·q + (−1)·(1 − q)
= q − (1 − q)
= 2q − 1.
If Player 1 plays T:
E[u₁ | T] = (−1)·q + 1·(1 − q)
= −q + 1 − q
= 1 − 2q.
Step 2: In a mixed equilibrium where Player 1 randomizes between H and T, Player 1 must be indifferent:
2q − 1 = 1 − 2q
⇒ 2q + 2q = 1 + 1
⇒ 4q = 2
⇒ q = 1/2.
Step 3: Do the symmetric computation for Player 2.
Player 2’s payoffs are the negative of Player 1’s (zero-sum), but we can still compute indifference.
If Player 2 plays H, expected u₂ given Player 1 plays H with probability p:
E[u₂ | H] = (−1)·p + 1·(1 − p)
= −p + 1 − p
= 1 − 2p.
If Player 2 plays T:
E[u₂ | T] = 1·p + (−1)·(1 − p)
= p − 1 + p
= 2p − 1.
Step 4: Indifference for Player 2:
1 − 2p = 2p − 1
⇒ 1 + 1 = 2p + 2p
⇒ 2 = 4p
⇒ p = 1/2.
Insight: Each player mixes to make the other indifferent. Here the only stable behavior is randomizing 50/50, because any bias can be exploited by the opponent.
Player 1 payoff matrix:
U₁ = [ [4, 0],
[1, 2] ]
Player 1 plays Top with probability p, Player 2 plays Left with probability q. Compute E[u₁] two ways: (1) by explicit expectation, (2) by matrix multiplication.
Step 1: Parameterize mixed strategies.
Player 1: p = (p, 1 − p).
Player 2: q = (q, 1 − q).
Step 2: Explicit expectation over the four outcomes.
E[u₁] = u₁(Top, Left)·P(Top, Left)
= 4·(p q) + 0·(p(1 − q)) + 1·((1 − p)q) + 2·((1 − p)(1 − q))
Now simplify:
E[u₁] = 4pq + (1 − p)q + 2(1 − p)(1 − q)
= 4pq + q − pq + 2(1 − p − q + pq)
= 4pq + q − pq + 2 − 2p − 2q + 2pq
= (4pq − pq + 2pq) + (q − 2q) + (2 − 2p)
= 5pq − q + 2 − 2p.
Step 3: Matrix method.
Compute U₁ q:
U₁ q = [ 4q + 0(1 − q),
1q + 2(1 − q) ]
= [ 4q,
q + 2 − 2q ]
= [ 4q,
2 − q ].
Now multiply by pᵀ:
E[u₁] = pᵀ (U₁ q)
= (p, 1 − p) · (4q, 2 − q)
= 4pq + (1 − p)(2 − q)
= 4pq + 2 − q − 2p + pq
= 5pq − q + 2 − 2p.
Insight: The matrix form pᵀ U₁ q is just expected value written compactly. It scales to larger action sets and makes equilibrium computation feel like linear algebra.
A game models strategic interdependence: your best choice depends on what others do, and vice versa.
A strategy sᵢ can be a pure action or a mixed strategy (a probability distribution over actions).
Payoffs are utilities uᵢ(s) assigned to each player i for every strategy profile s = (s₁, …, sₙ).
Under mixed strategies, players maximize expected payoff; for 2-player finite games E[u₁] = pᵀ U₁ q.
A Nash equilibrium s* is a strategy profile where no player can improve payoff by unilateral deviation.
Pure Nash equilibria in matrices can be found by intersecting best responses.
Some games have no pure Nash equilibrium; mixed equilibria arise when players randomize to make opponents indifferent.
Nash equilibrium is about stability, not necessarily efficiency or fairness.
Confusing Nash equilibrium with the socially best outcome (maximizing ∑ᵢ uᵢ). They can differ dramatically (e.g., Prisoner’s Dilemma).
Treating mixed strategies as “irrational randomness” rather than a deliberate optimal plan to avoid exploitation or ensure stability.
Forgetting the unilateral nature of Nash equilibrium: it blocks single-player deviations, not coordinated deviations by multiple players.
Misreading payoff matrices (mixing up whose payoff is first, or scanning rows/columns incorrectly when identifying best responses).
Consider Prisoner’s Dilemma with payoffs:
| C | D | |
|---|---|---|
| C | (3, 3) | (0, 5) |
| D | (5, 0) | (1, 1) |
(a) Find Player 1’s best response to C and to D.
(b) Find Player 2’s best response to C and to D.
(c) Identify the Nash equilibrium (or equilibria).
Hint: Best response means “higher own payoff holding the other fixed.” Compare numbers within a column for Player 1 and within a row for Player 2.
(a) If Player 2 plays C: Player 1 gets 3 from C and 5 from D ⇒ best response is D.
If Player 2 plays D: Player 1 gets 0 from C and 1 from D ⇒ best response is D.
(b) Symmetric reasoning: if Player 1 plays C, Player 2 best response is D (5 > 3). If Player 1 plays D, Player 2 best response is D (1 > 0).
(c) Both players best-respond with D regardless ⇒ (D, D) is the (unique) Nash equilibrium.
In the coordination game:
| Left | Right | |
|---|---|---|
| Up | (2, 2) | (0, 0) |
| Down | (0, 0) | (2, 2) |
(a) List all pure Nash equilibria.
(b) Is (Up, Right) a Nash equilibrium? Explain using unilateral deviation.
Hint: A pure Nash equilibrium must be a best response for each player at the same cell.
(a) (Up, Left) is a Nash equilibrium: given Left, Player 1 prefers Up (2 > 0); given Up, Player 2 prefers Left (2 > 0). Similarly (Down, Right) is a Nash equilibrium.
(b) (Up, Right) is not: given Right, Player 1 prefers Down (2 > 0), so Player 1 can unilaterally deviate and improve.
Let Player 1’s payoff matrix be
U₁ = [ [3, 1],
[0, 2] ]
Player 1 plays Top with probability p and Player 2 plays Left with probability q.
Compute E[u₁] as a simplified expression in p and q.
Hint: Use E[u₁] = pᵀ U₁ q with p = (p, 1 − p), q = (q, 1 − q).
Compute U₁ q:
U₁ q = [ 3q + 1(1 − q),
0·q + 2(1 − q) ]
= [ 3q + 1 − q,
2 − 2q ]
= [ 1 + 2q,
2 − 2q ].
Then E[u₁] = (p, 1 − p) · (1 + 2q, 2 − 2q)
= p(1 + 2q) + (1 − p)(2 − 2q)
= p + 2pq + 2 − 2q − 2p + 2pq
= 2 − 2q − p + 4pq.
Next, go deeper into equilibrium concepts and computation in Nash Equilibrium. For settings where binding agreements and coalition formation matter, see Cooperative Games.