Variables whose values are outcomes of random phenomena. Discrete vs continuous.
Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.
When you roll a die, “randomness” lives in the outcome. A random variable is the bridge that turns that outcome into a number you can analyze—so you can compute probabilities, averages, and compare different random phenomena using a shared language.
A random variable X is a function from outcomes ω in a sample space Ω to real numbers: X: Ω → ℝ. Its distribution can be summarized completely by the cumulative distribution function F_X(x) = P(X ≤ x). Discrete random variables use a PMF (probability mass function) and sums; continuous random variables use a PDF (density) and integrals, with P(X = x) = 0 for any single point.
In basic probability, you often talk directly about outcomes: heads/tails, die faces 1–6, card suits, etc. But most questions you care about are numerical:
Outcomes themselves can be messy objects (a full sequence of flips, a shuffled deck order, a user session log). A random variable lets you extract a number from each outcome so that probability tools can focus on the number.
A random variable is not “a random number floating around.” It is a rule (a function).
Let Ω be a sample space, and let ω ∈ Ω be an outcome. A random variable is a function
X: Ω → ℝ
meaning: for every outcome ω, the random variable outputs a real number X(ω).
Example A: one die roll
Here X is almost trivial—but it sets the pattern.
Example B: two coin flips
Then:
Notice what happened: multiple outcomes map to the same number. That’s normal and important.
Often Ω is huge, but your question depends on a small numeric summary. The random variable is that summary.
For learning probability, keep this mental model:
Once you have X, you can ask probability questions about X without constantly referring to ω.
If X is a function, you still need to know: “How likely is each possible value?”
There are multiple ways to describe this “likeliness,” but the most universal is the cumulative distribution function (CDF). It works for discrete and continuous cases, and it fully characterizes the distribution.
For a random variable X, the cumulative distribution function is
F_X(x) = P(X ≤ x)
Read it as: “the probability that X takes a value at most x.”
If you know F_X(x) for all real x, you can recover probabilities of intervals like P(a < X ≤ b).
PMFs and PDFs differ, but CDFs behave consistently.
For any real numbers a < b:
P(a < X ≤ b) = F_X(b) − F_X(a)
Derivation (showing the idea):
So
P(X ≤ b) = P(X ≤ a) + P(a < X ≤ b)
Rearrange:
P(a < X ≤ b) = P(X ≤ b) − P(X ≤ a) = F_X(b) − F_X(a)
A valid CDF F_X(x) must satisfy:
Be careful with language:
Later you’ll see that in discrete distributions you can use differences of the CDF to get point probabilities, while in continuous distributions differences give interval probabilities (and point probabilities are 0).
A random variable X is discrete if it takes values in a countable set, like {0, 1, 2, 3, …} or a finite set like {1, …, 6}.
In a discrete world, probability is stored as mass placed on individual points.
For a discrete random variable, the PMF is
p_X(x) = P(X = x)
It assigns a probability to each value x that X can take.
Two must-know conditions:
1) p_X(x) ≥ 0 for all x
2) ∑ over all possible x of p_X(x) = 1
If X is discrete, the CDF is a sum of masses up to x:
F_X(x) = P(X ≤ x) = ∑_{t ≤ x} p_X(t)
This is why the CDF looks like steps: each time you pass a value with mass, the CDF jumps.
If X is integer-valued (common case), you can recover point probabilities by differences:
p_X(k) = P(X = k) = F_X(k) − F_X(k − 1)
More generally, for any point x:
P(X = x) = F_X(x) − lim_{t↑x} F_X(t)
(That “left limit” is what captures the jump size at x.)
Let Ω = {HH, HT, TH, TT} with equal probability 1/4 each.
Define X = number of heads.
Then X takes values {0, 1, 2}.
Compute the PMF:
That’s a full description of the discrete distribution.
Any probability question becomes a sum.
For example:
P(X ≥ 1) = P(X = 1) + P(X = 2)
And using the CDF:
P(X ≥ 1) = 1 − P(X ≤ 0) = 1 − F_X(0)
You’re already seeing two equivalent perspectives: point masses (PMF) and cumulative probabilities (CDF).
Some quantities aren’t naturally countable:
You might model such a quantity as a continuous random variable. Here, probability is not concentrated on points. Instead, it is spread smoothly over intervals.
For a continuous random variable X:
P(X = x) = 0 for any single real number x
This is not a bug—it’s a consequence of having uncountably many possible values. Probability lives on intervals.
A continuous random variable is described by a probability density function (PDF) f_X(x) such that:
P(a ≤ X ≤ b) = ∫ from a to b f_X(x) dx
And the total probability is 1:
∫_{−∞}^{∞} f_X(x) dx = 1
Also f_X(x) ≥ 0.
Important: f_X(x) is a density, not a probability.
The CDF is still
F_X(x) = P(X ≤ x)
and it relates to the PDF by an integral:
F_X(x) = ∫_{−∞}^{x} f_X(t) dt
If f_X is nice enough (continuous), then differentiation recovers the PDF:
f_X(x) = d/dx F_X(x)
Same rule as always:
P(a < X ≤ b) = F_X(b) − F_X(a)
This is one reason the CDF is the “universal” summary: it works without caring whether X is discrete or continuous.
Suppose X is uniformly distributed on [0, 1]. Intuitively: every equal-length interval inside [0, 1] has equal probability.
PDF:
f_X(x) = 1 for 0 ≤ x ≤ 1, and 0 otherwise.
CDF:
Notice:
In discrete settings you sum masses.
In continuous settings you integrate density.
A useful comparison table:
| Concept | Discrete RV | Continuous RV |
|---|---|---|
| Values X can take | countable | uncountable interval(s) |
| Point probability | P(X = x) can be > 0 | P(X = x) = 0 |
| Main descriptor | PMF p_X(x) | PDF f_X(x) |
| Normalization | ∑ p_X(x) = 1 | ∫ f_X(x) dx = 1 |
| Interval probability | ∑ over x in interval | ∫ over interval |
| CDF | staircase | smooth (no jumps) |
If you remember only one line: discrete = mass on points, continuous = density over intervals.
Random variables are the entry point to almost everything in probability and statistics:
But the first modeling step is always the same:
1) Identify the sample space Ω (what outcomes are possible?)
2) Define a random variable X: Ω → ℝ (what number do you care about?)
3) Describe its distribution (PMF/PDF/CDF)
Below are common ways to define X.
An indicator is a random variable that turns an event into 0/1.
Let A be an event. Define
X(ω) = 1 if ω ∈ A, else 0.
This seems simple, but it becomes a building block for counting.
You can count something in the outcome:
Or sum values:
These lead naturally to discrete distributions.
Time-to-complete, response latency, measurement noise: these are naturally continuous.
Even when you don’t know a neat PMF/PDF, you can often reason in terms of CDFs:
This node unlocks two big next steps:
1) Expected Value Expected Value
Once X is defined, you can compute E[X] as a weighted average (sum/integral). This turns a distribution into a single representative number.
2) Common Distributions Common Distributions
Many random variables you define match standard families (Bernoulli, binomial, uniform, normal). Learning those families gives you ready-made PMFs/PDFs/CDFs.
When someone says “Let X be a random variable…”, train yourself to ask:
If you can answer those, you’re in control of the randomness.
Flip two fair coins. Let Ω = {HH, HT, TH, TT} with each outcome probability 1/4. Define the random variable X(ω) = number of heads in ω. Find p_X(x) and F_X(x).
List X(ω) for each outcome:
Compute the PMF p_X(x) = P(X = x):
Write the CDF F_X(x) = P(X ≤ x):
Check via differences (sanity check):
Insight: The random variable merges multiple outcomes into the same value (HT and TH both map to 1). The CDF shows this as jumps: each jump size equals the probability mass at that point.
Let X be Uniform(0, 1): f_X(x) = 1 for 0 ≤ x ≤ 1, otherwise 0. Compute (1) F_X(x), (2) P(0.2 ≤ X ≤ 0.5), and (3) P(X = 0.3).
Compute the CDF F_X(x) = ∫_{−∞}^{x} f_X(t) dt by cases.
Case 1: x < 0.
Then f_X(t) = 0 for all t ≤ x, so
F_X(x) = ∫_{−∞}^{x} 0 dt = 0.
Case 2: 0 ≤ x ≤ 1.
Then
F_X(x) = ∫_{−∞}^{0} 0 dt + ∫_{0}^{x} 1 dt
= 0 + [t]_{0}^{x}
= x.
Case 3: x > 1.
Then
F_X(x) = ∫_{−∞}^{0} 0 dt + ∫_{0}^{1} 1 dt + ∫_{1}^{x} 0 dt
= 0 + 1 + 0
= 1.
Compute the interval probability:
P(0.2 ≤ X ≤ 0.5) = ∫_{0.2}^{0.5} 1 dx
= [x]_{0.2}^{0.5}
= 0.5 − 0.2
= 0.3.
Compute the point probability:
P(X = 0.3) = 0 (continuous RVs assign zero probability to any single point).
Insight: For continuous RVs, f_X(x) is not a probability—it’s a density. Probabilities come from areas (integrals), and the CDF is the accumulated area up to x.
Roll two fair six-sided dice. The sample space is pairs (d₁, d₂) with 36 equally likely outcomes. Define two random variables:
S(d₁, d₂) = d₁ + d₂ (sum), and M(d₁, d₂) = max(d₁, d₂) (maximum). Compute P(S = 7) and P(M ≤ 3).
Compute P(S = 7): count outcomes (d₁, d₂) such that d₁ + d₂ = 7.
The pairs are:
(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)
There are 6 favorable outcomes out of 36 total.
So P(S = 7) = 6/36 = 1/6.
Compute P(M ≤ 3): this means both dice are ≤ 3, because max(d₁, d₂) ≤ 3 ⇔ d₁ ≤ 3 and d₂ ≤ 3.
Number of outcomes with d₁ ∈ {1,2,3} and d₂ ∈ {1,2,3} is 3 · 3 = 9.
So P(M ≤ 3) = 9/36 = 1/4.
Insight: The randomness is the same (two dice), but the question changes the mapping X(ω). Different random variables on the same Ω lead to different distributions and probabilities.
A random variable X is a function X: Ω → ℝ mapping each outcome ω to a real number X(ω).
Capital letters (X) denote the random variable (the mapping); lowercase (x) denotes a realized/possible value.
The CDF F_X(x) = P(X ≤ x) fully characterizes the distribution and works for both discrete and continuous RVs.
Discrete random variables have a PMF p_X(x) = P(X = x) and probabilities are computed with sums.
Continuous random variables have a PDF f_X(x) and probabilities are computed with integrals: P(a ≤ X ≤ b) = ∫_a^b f_X(x) dx.
For continuous RVs, P(X = x) = 0 for any single point; only intervals can have positive probability.
You can compute interval probabilities using the CDF: P(a < X ≤ b) = F_X(b) − F_X(a).
Thinking a random variable is a single random outcome, rather than a function that assigns a number to each outcome.
Confusing PDF values with probabilities (e.g., treating f_X(0.3) as P(X = 0.3)).
Forgetting that continuous point probabilities are zero and trying to compute P(X = x) via the PDF directly.
Mixing up the roles of X and x: writing P(x ≤ 3) instead of P(X ≤ 3).
A fair die is rolled. Define X as the indicator that the outcome is even: X = 1 if the roll is even, else 0. Find (1) P(X = 1), (2) F_X(x) for all x.
Hint: Even outcomes are {2,4,6}. For the CDF, consider ranges: x < 0, 0 ≤ x < 1, x ≥ 1.
We have P(X = 1) = P(even) = 3/6 = 1/2.
CDF:
Let X be a discrete RV with PMF: p_X(0)=0.2, p_X(1)=0.5, p_X(3)=0.3. Compute (1) F_X(1), (2) P(0 < X ≤ 3), (3) P(X = 2).
Hint: F_X(1) = P(X ≤ 1). For P(0 < X ≤ 3), use either summation over allowed values or F_X(3) − F_X(0).
(1) F_X(1) = P(X ≤ 1) = p_X(0)+p_X(1) = 0.2+0.5 = 0.7.
(2) P(0 < X ≤ 3) includes X=1 or X=3 (since 2 is not possible here): 0.5+0.3 = 0.8.
Equivalently: F_X(3) − F_X(0) = 1 − 0.2 = 0.8.
(3) P(X = 2) = 0 because 2 is not in the support (not assigned any mass).
Let X be continuous with PDF f_X(x) = 2x for 0 ≤ x ≤ 1 (and 0 otherwise). Compute (1) F_X(x) for 0 ≤ x ≤ 1, and (2) P(0.5 ≤ X ≤ 1).
Hint: Integrate: F_X(x) = ∫_0^x 2t dt. Then use either an integral over [0.5,1] or CDF differences.
(1) For 0 ≤ x ≤ 1:
F_X(x) = ∫_{−∞}^{x} f_X(t) dt = ∫_{0}^{x} 2t dt = [t²]_{0}^{x} = x².
(2) P(0.5 ≤ X ≤ 1) = F_X(1) − F_X(0.5) = 1² − (0.5)² = 1 − 0.25 = 0.75.
Next nodes:
Related refreshers:
Later connections: