P(A|B) - probability of A given B has occurred.
Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.
A fair coin has P(H) = 1/2. But if you’re told “the coin landed heads,” the probability of heads becomes 1. Conditional probability is the formal way to express this idea: once you learn B happened, you measure everything inside the world where B is true.
Conditional probability is the probability of event A given event B: P(A|B). It means you restrict your attention to outcomes in B, and then ask what fraction of those outcomes also lie in A. Formally (when P(B) > 0): P(A|B) = P(A ∩ B) / P(B).
In basic probability, you pick a sample space Ω (all possible outcomes) and assign probabilities to events (subsets of Ω). That’s great when you have no extra information.
But real reasoning almost always comes with information:
That phrase “given that …” is exactly what conditional probability captures.
Think of an event B as a filter. Once you learn B occurred, outcomes not in B are impossible, so you throw them away.
Conditional probability asks: inside this new world B, how likely is A?
So P(A|B) is not “A and B” (that’s intersection). It’s “A measured within B.”
If P(B) > 0, the conditional probability of A given B is
P(A|B) = P(A ∩ B) / P(B)
Read it slowly:
Imagine 100 equally likely outcomes.
Then:
The key idea: you don’t compare A ∩ B to Ω anymore; you compare it to B.
The definition requires P(B) > 0.
Why? Because if P(B) = 0, then “given B happened” is conditioning on something that never happens (in your probability model). The fraction P(A ∩ B)/P(B) would divide by zero.
At difficulty level 2, the important takeaway is: only condition on events with positive probability (for basic discrete problems).
Most mistakes with conditional probability come from not fully committing to the new sample space. Learners often keep using the original denominator (Ω) instead of the conditioned denominator (B).
Once you know B happened:
In equally likely discrete settings:
P(A|B) = |A ∩ B| / |B|
This is the same as the earlier formula, just using counts instead of probabilities.
Let Ω = {1,2,3,4,5,6}.
After conditioning on B, the new universe is {2,4,6}.
Within B:
So:
P(A|B) = 1/3
Notice how different this is from P(A) = 1/6. The evidence “even” made 6 more plausible.
Conditional probability can be approached in two equivalent ways:
| View | What you do | When it’s easiest |
|---|---|---|
| Restrict-and-count | Rewrite the sample space as B, then count A within it | Equally likely outcomes (dice, cards) |
| Use the formula | Compute P(A ∩ B) and P(B) from given probabilities | Non-uniform probabilities, word problems |
If you are inside B, the complement of A becomes “not A, but still inside B.”
P(Aᶜ|B) = 1 − P(A|B)
This is often an easy way to compute a conditional probability when the direct event is awkward.
Conditioning on B does two things:
1) Deletes outcomes not in B
2) Scales the remaining probabilities so they sum to 1
If the outcomes in B were originally equally likely, they remain equally likely relative to each other after conditioning.
If outcomes in B were not equally likely, you can still condition, but you must keep their original weights and then re-normalize.
Suppose we want a new probability measure P(·|B) that lives on the restricted universe B.
We want:
So we set
P(A|B) = c · P(A ∩ B)
Choose c so that P(B|B) = 1:
1 = P(B|B) = c · P(B ∩ B) = c · P(B)
So c = 1 / P(B).
Therefore:
P(A|B) = P(A ∩ B) / P(B)
This is the cleanest justification for the definition: it’s the unique way to “renormalize” probabilities inside B.
Sometimes you don’t want P(A|B). Instead, you want the probability that both events happen, P(A ∩ B). Conditional probability gives a direct bridge between these.
Starting from the definition:
P(A|B) = P(A ∩ B) / P(B)
Multiply both sides by P(B):
P(A ∩ B) = P(A|B) · P(B)
This is called the multiplication rule.
Be careful with order: both are true (when denominators are nonzero).
P(A ∩ B) = P(A|B)P(B)
P(A ∩ B) = P(B|A)P(A)
They describe the same intersection, just conditioning in different directions.
The intersection P(A ∩ B) is often hard to estimate directly, but conditional probabilities are natural in real situations.
Example narrative:
Then P(A ∩ B) = probability it rains and traffic is bad.
Conditional probability lets you build probabilities step-by-step. For three events A, B, C with appropriate nonzero probabilities:
P(A ∩ B ∩ C) = P(A|B ∩ C) · P(B|C) · P(C)
Interpretation: start from C, then within C consider B, then within (B ∩ C) consider A.
At this node’s level, you don’t need to memorize the general chain rule, but it’s useful to see that conditional probability is the building block for multi-step reasoning.
Intersection is symmetric:
A ∩ B = B ∩ A
But conditional probability generally is not:
P(A|B) ≠ P(B|A)
Example:
If a number is divisible by 6, it must be divisible by 2, so P(A|B) = 1.
But if a number is divisible by 2, it is not necessarily divisible by 6, so P(B|A) < 1.
This non-symmetry is the entire reason Bayes’ Theorem is interesting later: it provides a way to relate the two directions.
A lot of conditional probability skill is language parsing.
A practical technique: rewrite the question as
“Among outcomes where B is true, what fraction have A true?”
Consider medical testing language:
Two different conditionals:
These are not the same. Conditional probability makes that distinction precise.
Even before Bayes’ Theorem, you can see the structure:
This node equips you to keep the symbols straight so Bayes later feels like algebra, not magic.
Independence will be unlocked soon. Conditional probability is the quickest way to express it.
A and B are independent exactly when
P(A|B) = P(A)
(assuming P(B) > 0)
Interpretation: learning B doesn’t change your belief about A.
Equivalently:
P(A ∩ B) = P(A)P(B)
But conceptually, the conditional form is often more intuitive: no update.
A Markov chain is about transitions like
P(Xₜ₊₁ = j | Xₜ = i)
That is literally conditional probability: the next state given the current state.
So this node is foundational: without comfort reading and manipulating P(·|·), transition matrices and “memoryless” properties will feel opaque.
1) Identify events A and B clearly.
2) Confirm P(B) > 0 (in discrete problems, B must have at least one outcome).
3) Decide approach:
4) Be explicit about the denominator: after conditioning, your denominator is B.
5) Sanity-check: 0 ≤ P(A|B) ≤ 1, and if A ⊂ B then P(A|B) = P(A)/P(B) and should be ≤ 1.
That last point is a great self-check: if you compute something bigger than 1, your denominator or event interpretation is wrong.
A standard 52-card deck. Let A = “card is an Ace”. Let B = “card is a Spade”. Find P(A|B).
Step 1: Translate the meaning.
P(A|B) means: among the spades, what fraction are aces?
Step 2: Count the conditioned universe.
B = “Spade” → there are 13 spades.
So |B| = 13.
Step 3: Count the overlap.
A ∩ B = “Ace and Spade” → only the Ace of Spades.
So |A ∩ B| = 1.
Step 4: Compute the conditional probability using counts.
P(A|B) = |A ∩ B| / |B| = 1 / 13.
Insight: Conditioning turned a 52-outcome space into a 13-outcome space. The denominator must match the condition.
Suppose P(B) = 0.30 and P(A|B) = 0.20. Find (1) P(A ∩ B) and (2) P(Aᶜ|B).
Part (1): Use the multiplication rule.
We know:
P(A ∩ B) = P(A|B)P(B)
Compute:
P(A ∩ B) = 0.20 · 0.30 = 0.06
Part (2): Use the complement rule inside the condition.
P(Aᶜ|B) = 1 − P(A|B)
Compute:
P(Aᶜ|B) = 1 − 0.20 = 0.80
Insight: Once you know one conditional probability, you can often get several others quickly using algebraic identities (multiplication and complements).
Roll a fair six-sided die. Let A = “roll is greater than 3” and B = “roll is odd”. Compute P(A|B) and compare to P(A).
Step 1: List outcomes.
Ω = {1,2,3,4,5,6}
A = {4,5,6}
B = {1,3,5}
Step 2: Restrict to B.
Given B occurred, possible outcomes are {1,3,5}. So |B| = 3.
Step 3: Find overlap A ∩ B.
A ∩ B = {5}. So |A ∩ B| = 1.
Step 4: Compute conditional.
P(A|B) = |A ∩ B| / |B| = 1/3.
Step 5: Compute unconditional for comparison.
P(A) = |A| / |Ω| = 3/6 = 1/2.
Insight: Learning “odd” made outcomes {4,6} impossible, which reduced the chance of being > 3 from 1/2 down to 1/3.
Conditional probability measures A within the world where B is known to occur.
Definition (requires P(B) > 0): P(A|B) = P(A ∩ B) / P(B).
Conditioning restricts the sample space to B; the denominator becomes B, not Ω.
Multiplication rule: P(A ∩ B) = P(A|B)P(B) (and also = P(B|A)P(A)).
Conditional probability is generally not symmetric: P(A|B) ≠ P(B|A).
Complement works inside conditions: P(Aᶜ|B) = 1 − P(A|B).
Independence can be expressed as “no update”: P(A|B) = P(A) (when P(B) > 0).
Using the original sample space Ω as the denominator instead of using the conditioned event B.
Confusing P(A|B) with P(A ∩ B): “given” vs “and”.
Assuming P(A|B) = P(B|A) just because the same letters appear.
Conditioning on an impossible event (forgetting the requirement P(B) > 0).
A fair die is rolled. Let A = “the roll is 2 or 3” and B = “the roll is less than 4”. Compute P(A|B).
Hint: Restrict the sample space to B first, then count outcomes in A within that restricted set.
Ω = {1,2,3,4,5,6}
A = {2,3}
B = {1,2,3}
A ∩ B = {2,3}
P(A|B) = |A ∩ B| / |B| = 2/3.
You are told that P(B) = 0.4 and P(A ∩ B) = 0.1. Compute P(A|B).
Hint: Use the definition P(A|B) = P(A ∩ B)/P(B).
Given P(B) = 0.4 and P(A ∩ B) = 0.1,
P(A|B) = 0.1 / 0.4 = 0.25.
A bag has 3 red balls and 2 blue balls. Two balls are drawn without replacement. Let A = “the second ball is red” and B = “the first ball is red”. Compute P(A|B).
Hint: After conditioning on B, update the bag’s composition before computing the probability of A.
Initially: 3R, 2B (5 total).
Condition on B: the first ball is red, so remove one red.
Remaining bag: 2R, 2B (4 total).
Event A: second ball is red.
So P(A|B) = 2/4 = 1/2.
Next nodes: