Expected Value

Probability & StatisticsDifficulty: ★★☆☆☆Depth: 3Unlocks: 74

Long-run average of a random variable. E[X] = sum of x*P(x).

Interactive Visualization

t=0s

Core Concepts

-Expected value = the theoretical long-run average of a random variable (the value the sample mean approaches under repeated independent draws)
-Definition as a probability-weighted average: for discrete X, E[X] = sum_x x * P(X = x); for continuous X, E[X] = integral x * f_X(x) dx
-Existence criterion: the expectation is defined only if the weighted sum/integral converges (finite); otherwise E[X] may be infinite or undefined

Key Symbols & Notation

E[X] (expectation operator on random variable X)

Essential Relationships

-Linearity: E[aX + b] = a * E[X] + b for constants a,b

Prerequisites (1)

Random Variables6 atoms

Unlocks (11)

Variancelvl 2

Entropylvl 3

Game Theory Introductionlvl 3

Law of Large Numberslvl 3

Stochastic Gradient Descentlvl 4

Mechanism Designlvl 4

Bias-Variance Tradeofflvl 4

Reinforcement Learning Introductionlvl 4

+3 more...

Referenced by (72)

Where this concept shows up in the operating-finance and personal-finance graphs.

From Business (48)

opportunity costBusiness

Quantifying opportunity cost requires comparing the expected values of competing alternatives - E[X_chosen] vs E[X_forgone] - making expected value the core mathematical operation behind every opportunity cost calculation

Hiring TargetsBusiness

Consistently exceeding hiring targets requires pipeline math: expected_hires = candidates × screen_pass_rate × interview_pass_rate × offer_accept_rate. Setting and beating targets is applied expected value over a recruiting funnel.

claimsBusiness

Pricing insurance to cover average claims is literally computing E[X] over the claims distribution - the premium must at least equal the expected payout per policy plus expenses

Pipeline VolumeBusiness

Weighted pipeline volume is literally expected value: sum of (deal_value × close_probability) across all opportunities

personal financeBusiness

The moderate-interest debt gray zone (4-7% APR) is an expected value comparison: guaranteed return of debt payoff vs probabilistic ~7-10% market return

risk-neutralBusiness

Risk-neutral agents maximize expected value directly - E[v_i x_i(b) - p_i(b)] - with no adjustment for variance or higher moments. The equivalence between risk-neutrality and pure expected-value maximization is the core mathematical consequence.

Forced BorrowingBusiness

Quantifies why maintaining liquidity is rational: E[cost avoided] = P(liquidity crisis) × forced borrowing cost ($5K-$20K), making the buffer's value calculable as a probability-weighted savings

rent-vs-buy decisionBusiness

Probabilistic version: if skiing days are drawn from a distribution, the optimal buy-timing minimizes E[total cost] = E[min(days,tau)]*rent + I(days>=tau)*B, a direct expected value calculation.

insuranceBusiness

Insurance pricing IS expected value computation: premium = E[claims] + margin, so the core question 'how much to charge' reduces to estimating E[X] over the loss distribution

Expected ReturnBusiness

Direct mathematical definition: E[X] = sum x*P(x). Expected return IS expected value applied to project cash flows. You need this to state the concept precisely.

Guaranteed ReturnBusiness

A guaranteed 50-100% return is an expected value with zero variance, which formalizes why it dominates any probabilistic alternative - no risk-adjusted comparison can beat a certain payoff of that magnitude.

Stock ReturnsBusiness

A stock return is literally the expected value of an operating outcome distribution - understanding E[X] = sum of x*P(x) is the mathematical foundation for reasoning about what returns actually are and why they vary

Expected PayoffBusiness

Expected payoff under mixed strategies is literally E[u_i(s)] where s is drawn from the joint mixed strategy profile; the entire concept reduces to computing expected value of a random variable whose distribution is determined by players' mixing probabilities.

Contingent LiabilitiesBusiness

Provisioning for contingent liabilities requires computing expected loss = P(materializing) × potential obligation amount, the fundamental expected value calculation

Risk-Adjusted ValueBusiness

Risk-adjusted value is expected value penalized for uncertainty - the formula E[V] - λσ² starts from E[X] = Σ x·P(x) and subtracts a risk penalty, so expected value is the mathematical primitive being modified

VarianceBusiness

Expected value is the quantity the business concept holds constant - understanding that E[X] alone is insufficient to characterize a distribution requires understanding both first and second moments together

AlphaBusiness

Alpha is formally E[R_portfolio] - E[R_benchmark]; understanding expected value as a weighted average over outcomes is the mathematical foundation for defining and measuring any 'spread' between your performance and a reference rate

Expected Market ReturnBusiness

Expected market return is literally E[R] = sum of r*P(r) over the distribution of possible market outcomes - the statistical expected value applied to return distributions

Capital AllocationBusiness

NPV - the CFO's primary capital allocation tool - is literally expected value of discounted future cash flows. Every automation investment decision reduces to E[returns] vs E[costs], weighted by probability of each outcome.

Capital BudgetingBusiness

NPV - the core tool of capital budgeting - is the expected present value of a project's uncertain future cash flows. Each period's cash flow is weighted by probability and discounted, making NPV a direct application of E[X] over a time-indexed random variable.

investment decisionBusiness

Direct mathematical formalization: E[X] = sum of (payoff * probability) is exactly the calculation implied by 'cost, probability of success, and a payoff'

UnderwritingBusiness

ROI underwriting is fundamentally expected value calculation - probability-weighted outcomes across scenarios (base, upside, downside) to arrive at an investment decision

Off-Balance-Sheet RisksBusiness

Valuing a contingent liability requires computing E[cost] = P(trigger) × magnitude. A co-signed loan has expected cost = P(default) × $50K; a tax assessment with range $20K-$80K needs a probability-weighted value. Expected value is the mathematical primitive for converting uncertain off-balance-sheet items into comparable dollar figures.

Co-SigningBusiness

The true cost of co-signing is E[loss] = P(default) × obligation_amount; people systematically fail to compute this expected value for contingent liabilities, which is why $50K loans and $20-80K tax exposures catch co-signers off guard

Tax AssessmentsBusiness

Contingent tax liabilities are best modeled as expected values: P(triggered) × assessment_amount. This is the correct framework for deciding how much to reserve against a liability that may or may not materialize

investment returnsBusiness

The 5-7% real return IS an expected value of a random variable - understanding E[X] as a probability-weighted long-run average is the mathematical concept that makes 'expected returns' a precise claim rather than a guess

mortgage rateBusiness

The core comparison is guaranteed return (paydown saves 3-5% with certainty) vs E[X] of investment portfolio (5-7% expected but with variance) - understanding expected value formalizes when the spread justifies the risk

Lifetime ValueBusiness

LTV is literally an expected value computation: E[revenue_per_period] times E[customer_lifetime]. The LTV = ARPU / churn formula derives from summing a geometric series of retention probabilities, but the core concept is the long-run average value of the customer-lifetime random variable.

entry feeBusiness

The core math behind entry fees: a rational bidder participates iff E[payoff from mechanism] >= entry fee. A bidder with v=0 has E[value]=0, so any positive entry fee makes participation negative-EV, which is exactly why entry fees cause revenue to differ across mechanisms with identical allocations.

Bid ShadingBusiness

The optimal bid shade maximizes E[(v - b) * P(win|b)] - the tradeoff between surplus captured per win (v - b) and probability of winning, which is a direct expected value optimization

Bet SizingBusiness

The yield function Y(x) computes E[return | invest x] - expected value is the mathematical core that makes the yield function calculable and bet sizing optimizable rather than intuitive

ROI underwritingBusiness

ROI underwriting is fundamentally expected value computation: probability-weighted cash flows across scenarios, discounted back to present value

pre-tax vs post-taxBusiness

The Traditional vs Roth decision is an expected value comparison under uncertainty: E[tax cost] of paying known rate now vs E[tax cost] of paying unknown future rate, weighted by beliefs about future bracket placement

RefinancingBusiness

Variable-rate refinancing options (ARMs, introductory teaser rates that reset) require expected value analysis across rate scenarios to compare against fixed-rate alternatives - the core question is whether E[total cost under variable] < total cost under fixed

Roth vs TraditionalBusiness

Stripped of dogma, the decision is: compare E[marginal_rate_at_withdrawal] to marginal_rate_at_contribution. You must estimate the expected value of a future tax rate under uncertainty (career trajectory, legislative risk, retirement income mix) - not memorize 'Roth is always better for young people.'

high-interest debtBusiness

The decision to attack debt vs invest is an expected value comparison: E[debt payoff] = guaranteed APR return with zero variance vs E[market] ≈ 7-10% with substantial variance - the guaranteed return dominates when rates are comparable because it carries no risk

Early Mortgage PrepaymentBusiness

The formal comparison: E[investing] is higher but uncertain, while prepayment is a guaranteed return equal to the mortgage rate - the decision hinges on whether you value expected value or certainty

Expected Total CostBusiness

Expected total cost is literally an expected value calculation - E[Total Cost] = sum of cost_i * P(scenario_i) across possible outcomes for each consolidation option

index fundsBusiness

The core case for index funds is an expected value argument: active manager alpha is zero-sum before fees and negative-sum after fees, so the expected return of passive indexing exceeds the expected return of active management net of costs.

Emergency FundBusiness

The entire rationale for an emergency fund is an EV calculation: E[cost of no buffer] (penalty APR, late fees, new debt at high rates) exceeds the opportunity cost of idle cash

Hurdle RateBusiness

Expected return E[R] of the opportunity is compared against the hurdle rate threshold - the entire pricing decision reduces to whether E[R] exceeds the minimum acceptable return

Future ValueBusiness

"Expected return" is E[X] applied to investment returns; future value under uncertainty requires computing the expectation over a return distribution rather than assuming a fixed rate

Discounted Cash FlowBusiness

Real DCF operates on uncertain future cash flows - each CF_t is a random variable, and the valuation is E[Σ CF_t / (1+r)^t]. Probability-weighted scenarios are expected value calculations.

Net Present ValueBusiness

NPV is the expected value of discounted future cash flows - E[CF_t / (1+r)^t] summed over periods. Uncertain automation payoffs require computing expectations over probability-weighted outcomes.

Portfolio AlphaBusiness

Alpha is literally defined as excess expected return over a benchmark. E[R_portfolio] - E[R_benchmark] = alpha. Cannot define, measure, or reason about portfolio alpha without expected value as the foundational concept.

NPVBusiness

NPV is a weighted sum of expected cash flows where weights are discount factors: NPV = sum of E[CF_t] / (1+r)^t - understanding E[X] = sum x*P(x) is the mathematical foundation for collapsing uncertain future payoffs into a single present value

Internal Rate of ReturnBusiness

NPV is structurally an expected value computation - sum of future cash flows weighted by discount factors, identical form to E[X] = sum of x*P(x)

LBO ModelingBusiness

LBO returns analysis computes probability-weighted IRR/MOIC across base, upside, and downside scenarios - the core of investment committee decision-making

From Money (24)

Compound InterestMoney

Future value depends on expected return distributions

Opportunity CostMoney

Opportunity cost is the expected value of the next-best alternative

Employer 401k MatchMoney

100% match is a guaranteed return exceeding any expected market return

Debt ConsolidationMoney

Compare expected total cost across consolidation options

Insurance BasicsMoney

Insurance pricing is an expected value calculation: premium vs probability times loss

Disability InsuranceMoney

Income replacement value is the discounted expected future earnings stream

Life InsuranceMoney

Coverage amount is the present value of expected remaining income stream

Moderate-Interest DebtMoney

Compare guaranteed debt payoff return vs uncertain market expected return

Pre-Tax vs Post-TaxMoney

Traditional vs Roth is an expected value comparison under tax rate uncertainty

Traditional vs Roth IRAMoney

Optimal wrapper depends on expected future vs current tax rate

15% Savings RateMoney

The target assumes expected real market returns of 5-7%

529 Education SavingsMoney

Education cost growth vs investment growth is an expected value comparison

Asset ClassesMoney

Return is the expected value of the outcome distribution

Dollar-Cost AveragingMoney

DCA vs lump sum is an expected value comparison - lump sum wins roughly 2/3 of the time

Rent vs BuyMoney

Break-even analysis compares expected total costs of renting vs owning

Rental Property MathMoney

NOI and cap rate are expected return calculations from the property cash flow distribution

Return on EquityMoney

Redeployment decisions compare expected returns across different uses of capital

Real Estate LeverageMoney

Leveraged returns amplify both expected gains and expected losses

Roth Conversion LadderMoney

Optimal conversion amount depends on expected future tax rates

Large Purchase PlanningMoney

Time horizon determines which vehicle maximizes expected risk-adjusted value

FIRE MathMoney

Safe withdrawal rate depends on expected real portfolio returns

Career as AssetMoney

Career decisions are expected value calculations on human capital

Options BasicsMoney

Option premium equals the risk-neutral expected payoff

Alternative InvestmentsMoney

Illiquidity premium compensates for uncertain exit timing and higher variance

Advanced Learning Details

Graph Position

Depth Cost

Fan-Out (ROI)

Bottleneck Score

Chain Length

Cognitive Load

Atomic Elements

Total Elements

Percentile Level

Atomic Level

All Concepts (5)

- Expected value: a single-number summary of a random variable representing its long-run average outcome
- Expectation defined for discrete random variables as a probability-weighted average of possible outcomes
- Interpretation of expectation as the average result one would observe over many repeated independent trials (long-run frequency interpretation)
- Existence/finition condition: the expectation is meaningful only when the defining sum converges to a finite value
- Support-based summation: the expectation is computed by summing only over the random variable's possible values (its support)

Teaching Strategy

Self-serve tutorial - low prerequisites, straightforward concepts.

If you repeatedly play a lottery, flip a biased coin for points, or measure noisy sensor readings, you eventually want one number that summarizes what you “typically” get. Expected value is that number: the long-run average you should plan around, even though any single outcome can differ.

TL;DR:

The expected value (expectation) E[X] of a random variable X is its probability-weighted average. Discrete: E[X] = ∑ₓ x·P(X=x). Continuous: E[X] = ∫ x f_X(x) dx. Expectation is linear (E[aX+b]=aE[X]+b) and generalizes to E[g(X)]. It may be infinite or undefined if the weighted sum/integral doesn’t converge.

What Is Expected Value?

Why we need it (motivation)

A random variable X can take many values depending on chance. If you want to:

•compare two gambling games,
•decide whether an insurance policy is “fair,”
•estimate average runtime of a randomized algorithm,
•or reason about average loss in machine learning,

you need a single summary number.

Expected value is the theoretical long-run average: if you repeatedly draw independent samples X₁, X₂, … from the same distribution, the sample mean

\bar X_n = \frac{1}{n}\sum_{i=1}^n X_i

tends to get close to a fixed value. That fixed value (when it exists) is E[X]. (A later node formalizes this as the Law of Large Numbers.)

Intuition first: “average with weights”

Suppose outcomes x happen with probabilities p(x). An ordinary average gives each outcome equal weight. Expected value gives outcome x the weight p(x).

So if big outcomes are rare, they still matter—but only proportionally to how often they occur.

Definition (discrete)

If X is discrete with possible values x and probability mass function P(X = x), the expectation is

\mathbb{E}[X] = \sum_x x\,\mathbb{P}(X=x).

You can read this as “sum over all outcomes: (value) × (chance of that value).”

Definition (continuous)

If X is continuous with probability density function f_X(x), then

\mathbb{E}[X] = \int_{-\infty}^{\infty} x\, f_X(x)\, dx.

This is the same idea: infinitely many possible values, so the weighted average becomes an integral.

Units and interpretation

•E[X] has the same units as X. If X is dollars, E[X] is dollars.
•E[X] is not necessarily a value X can actually take (e.g., the average of 0 and 1 is 0.5).

Difficulty calibration note

We’ll treat the following as core vs optional/advanced:

•Core: compute E[X] for discrete/continuous distributions; linearity; interpretation.
•Optional/Advanced: existence/undefined expectations; heavy tails; expectation of functions E[g(X)].

You can learn and apply expected value well with the core, then come back for the optional parts when you need them.

Core Mechanic 1: Computing E[X] as a Probability-Weighted Average

Discrete examples: build the pattern

For a discrete X, you need two things:

1) the set of possible values {x}, and

2) the probability of each value p(x).

Then compute the weighted sum ∑ x p(x).

Example pattern: dice

Let X be the value of a fair six-sided die. Then P(X=k)=1/6 for k∈{1,2,3,4,5,6}.

\mathbb{E}[X] = \sum_{k=1}^6 k\cdot \frac{1}{6} = \frac{1}{6}(1+2+3+4+5+6)=3.5.

Notice 3.5 is not an outcome on the die—expected value is a planning number, not a predicted single roll.

Continuous examples: computing E[X] with integrals

For continuous X, the density f_X(x) plays the role of “probability per unit x.” The integral

\int x f_X(x) dx

is the continuous weighted average.

Example pattern: Uniform(0,1)

If X ∼ Uniform(0,1), then f_X(x)=1 for x∈[0,1] and 0 otherwise.

\mathbb{E}[X]=\int_0^1 x\cdot 1\, dx = \left[\frac{x^2}{2}\right]_0^1 = \frac{1}{2}.

Expectation as “center of mass” (intuition)

A useful physical analogy: imagine each value x has “mass” p(x) (discrete) or density f(x)dx (continuous). The expected value is the balance point.

•If probability mass shifts right, E[X] increases.
•If you add a rare but huge outcome, E[X] can jump noticeably.

Sanity checks when computing

1) Range check: If X is always between a and b, then E[X] must lie between a and b.

2) Symmetry: If a distribution is symmetric around 0, often E[X]=0 (when it exists).

3) Weights sum to 1: For discrete, verify ∑ p(x)=1; for continuous, ∫ f(x)dx=1.

Core takeaway

Computing expectation is usually straightforward bookkeeping—until you meet distributions with extremely large values or tails. That’s where the next sections add nuance.

Core Mechanic 2: Linearity of Expectation (the "superpower")

Why linearity matters

Many random variables are built from simpler pieces:

•total reward = sum of per-step rewards,
•total cost = sum of random costs,
•total heads = sum of indicator variables,
•ML loss over a dataset = average of per-example losses.

Computing the full distribution of a sum can be hard. Expected value often stays easy because expectation is linear.

Linearity rules (core)

For random variables X and Y (no independence required):

\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y].

For constants a, b:

\mathbb{E}[aX+b] = a\,\mathbb{E}[X] + b.

More generally, for any finite sum:

\mathbb{E}\Big[\sum_{i=1}^n X_i\Big] = \sum_{i=1}^n \mathbb{E}[X_i].

Mini-derivation (discrete)

Assume (X,Y) are discrete.

Start with the definition:

\mathbb{E}[X+Y] = \sum_{x,y} (x+y)\,\mathbb{P}(X=x, Y=y).

Split the sum:

\mathbb{E}[X+Y] = \sum_{x,y} x\,\mathbb{P}(X=x, Y=y) + \sum_{x,y} y\,\mathbb{P}(X=x, Y=y).

Now notice:

\sum_{y} \mathbb{P}(X=x, Y=y) = \mathbb{P}(X=x)

\sum_{x,y} x\,\mathbb{P}(X=x, Y=y) = \sum_x x\,\mathbb{P}(X=x) = \mathbb{E}[X].

Similarly the second term becomes E[Y]. Therefore E[X+Y]=E[X]+E[Y].

Indicators: a common trick

Define an indicator random variable I for an event A:

•I = 1 if A happens
•I = 0 otherwise

Then

\mathbb{E}[I] = 1\cdot \mathbb{P}(A) + 0\cdot (1-\mathbb{P}(A)) = \mathbb{P}(A).

This turns probability questions into expectation questions.

Example: Let X be the number of heads in n coin flips (not necessarily fair). Let Iᵢ indicate “flip i is heads.” Then X = ∑ᵢ Iᵢ, so

\mathbb{E}[X] = \sum_{i=1}^n \mathbb{E}[I_i] = \sum_{i=1}^n \mathbb{P}(\text{heads on } i).

If the coin has P(heads)=p each time, then E[X]=np.

What linearity does not say

A classic confusion is to assume expectation distributes over products:

•Generally, E[XY] ≠ E[X]E[Y].

That equality holds under independence (and some integrability conditions), but linearity alone doesn’t give it.

Optional/Advanced: Expectation of Functions E[g(X)] and Existence Issues

E[g(X)] (operator viewpoint)

Expected value is not just a number attached to X—it’s an operator that maps a random variable to a number.

Often we care about a transformed quantity g(X):

•squared error: g(X) = (X−c)²
•absolute deviation: g(X)=|X|
•utility in economics: g(X)=u(X)
•loss in ML: g(X)=ℓ(X, y)

Definition (discrete)

\mathbb{E}[g(X)] = \sum_x g(x)\,\mathbb{P}(X=x).

Definition (continuous)

\mathbb{E}[g(X)] = \int_{-\infty}^{\infty} g(x)\, f_X(x)\, dx.

This is the same weighted-average idea, just applied after transforming the outcomes.

Law of the unconscious statistician (LOUTS)

A subtle but powerful point: to compute E[g(X)], you usually do not need the distribution of Y=g(X). You can compute directly from X’s distribution using the formulas above.

When expectation does not exist (or is infinite)

So far, we’ve treated expectation as always producing a finite number. But expectation is only well-defined when the weighted sum/integral converges.

A practical sufficient condition is that the absolute expectation is finite:

•Discrete: $ $\sum_x |x|\,\mathbb{P}(X=x) < \infty$ $
•Continuous: $ $\int_{-\infty}^{\infty} |x|\, f_X(x)\, dx < \infty$ $

If these diverge, several things can happen:

Situation	What it means	Typical phrase
E[X] is a finite real number	weighted average converges	“expectation exists”
E[X] = +∞ or −∞	one-sided integral/sum diverges	“infinite expectation”
undefined	positive and negative parts both diverge	“does not exist”

Heavy tails (intuition)

A heavy-tailed distribution puts enough probability on huge values that the “average” never settles.

A famous example is the Cauchy distribution: it’s symmetric, but its tails are so heavy that E[X] is undefined (the integral does not converge in the required sense). That’s why sample means of Cauchy draws behave wildly even for large n.

Why this matters in practice

•In modeling, assuming E[X] exists lets you use many theorems (LLN, variance formulas, etc.).
•In risk/finance, rare catastrophic outcomes can dominate expectation.
•In ML, expectation of loss is usually well-defined, but certain data distributions (extreme outliers) can make empirical averages unstable.

If you’re learning expectation for the first time, don’t let this scare you: most standard distributions used early (Bernoulli, Binomial, Uniform, Normal, Exponential) have finite expectations. But it’s important to know that “average” is not guaranteed by definition—it’s a property that may or may not hold.

Applications and Connections: From Fair Games to SGD

Fairness and pricing (games, insurance)

A gamble is often called “fair” if its expected payoff is 0 (or if the price equals expected payout).

If you pay cost c to play and receive random payout X, then net payoff is X−c. By linearity:

\mathbb{E}[X-c] = \mathbb{E}[X] - c.

A fair price is c = E[X] (ignoring risk preferences).

Expected loss (machine learning viewpoint)

In supervised learning, we often minimize expected risk:

R(\theta) = \mathbb{E}_{(x,y)\sim \mathcal{D}}\,[\ell(\theta; x,y)].

We don’t know the true distribution 𝒟, so we approximate R(θ) with the empirical average over data:

\hat R(\theta) = \frac{1}{n}\sum_{i=1}^n \ell(\theta; x_i, y_i).

The idea “sample average ≈ expected value” is exactly the intuition behind expected value and what the Law of Large Numbers formalizes.

Why SGD works with expectation

Stochastic Gradient Descent uses a random mini-batch to estimate the gradient of expected loss.

If g(θ) is the gradient computed from a random sample, SGD relies on it being (approximately) unbiased:

\mathbb{E}[g(\theta)] = \nabla R(\theta).

This is an expectation statement: on average, the noisy gradient points in the true direction.

Connecting expectation to what comes next

•Variance measures spread around the mean μ = E[X] using E[(X−μ)²].
•Entropy uses an expectation too: H(X)=E[−log p(X)] (discrete).
•Law of Large Numbers explains the long-run convergence of sample means to E[X].
•Game theory uses expected payoff when players randomize strategies.

Expected value is the first “global” summary of a distribution you should reach for: it’s simple, composable via linearity, and it’s the backbone of many later definitions.

Worked Examples (3)

Compute E[X] for a simple discrete gamble

A game pays $10 with probability 0.2, pays $2 with probability 0.5, and pays $0 with probability 0.3. Let X be the payout in dollars. Find E[X] and a fair entry price c (ignoring risk).

List outcomes and probabilities:
- •x=10 with p=0.2
- •x=2 with p=0.5
- •x=0 with p=0.3
Apply the discrete expectation formula:
$\mathbb{E}[X]=\sum_x x\,\mathbb{P}(X=x)=10(0.2)+2(0.5)+0(0.3).$
Compute:
10(0.2)=2
2(0.5)=1
0(0.3)=0
So
$\mathbb{E}[X]=2+1+0=3.$
A fair entry price c makes expected net payoff zero:
Net payoff = X − c
$\mathbb{E}[X-c]=\mathbb{E}[X]-c=0 \Rightarrow c=\mathbb{E}[X]=3.$

Insight: Expected value treats each payout as contributing “value × frequency.” Even though $10 is large, it happens only 20% of the time, so it contributes $2 to the average.

Use linearity with indicator variables: expected number of successes

You run 5 independent trials. Trial i succeeds with probability pᵢ (not necessarily the same across trials). Let X be the total number of successes. Compute E[X].

Define indicator variables:
Let Iᵢ = 1 if trial i succeeds, else 0.
Then the total number of successes is
$X = \sum_{i=1}^5 I_i.$
Compute each indicator’s expectation:
$\mathbb{E}[I_i] = 1\cdot p_i + 0\cdot (1-p_i)=p_i.$
Apply linearity of expectation (no extra assumptions needed beyond finiteness):
$\mathbb{E}[X] = \mathbb{E}\left[\sum_{i=1}^5 I_i\right] = \sum_{i=1}^5 \mathbb{E}[I_i] = \sum_{i=1}^5 p_i.$

Insight: Linearity lets you avoid computing the distribution of X. Even if the trials have different probabilities, the expected total is just the sum of the individual success probabilities.

Continuous expectation: E[X] for an exponential distribution

Let X have an exponential distribution with rate λ>0, meaning f_X(x)=λe^{−λx} for x≥0 and 0 otherwise. Compute E[X].

Start from the definition:
$\mathbb{E}[X] = \int_{-\infty}^{\infty} x f_X(x)\,dx = \int_0^{\infty} x\, \lambda e^{-\lambda x}\,dx.$
Compute the integral using integration by parts.
Let u = x so du = dx.
Let dv = \lambda e^{-\lambda x} dx so v = -e^{-\lambda x}.
Then
$\int_0^{\infty} x\, \lambda e^{-\lambda x} dx = \left[x(-e^{-\lambda x})\right]_0^{\infty} - \int_0^{\infty} (-e^{-\lambda x})\,dx.$
Evaluate the boundary term:
As x→∞, x e^{−λx} → 0, so x(−e^{−λx}) → 0.
At x=0, x(−e^{0}) = 0.
So
$\left[x(-e^{-\lambda x})\right]_0^{\infty} = 0.$
Compute the remaining integral:
$-\int_0^{\infty} (-e^{-\lambda x}) dx = \int_0^{\infty} e^{-\lambda x} dx = \left[-\frac{1}{\lambda}e^{-\lambda x}\right]_0^{\infty} = 0 - \left(-\frac{1}{\lambda}\right)=\frac{1}{\lambda}.$
Therefore
$\mathbb{E}[X] = \frac{1}{\lambda}.$

Insight: For continuous variables, expectation is still a weighted average—just spread across a continuum. The exponential distribution’s mean 1/λ matches the intuition: higher rate λ means shorter expected waiting time.

Key Takeaways

✓
Expected value E[X] is the theoretical long-run average of a random variable, aligning with the sample mean under repeated draws (formalized later by LLN).
✓
Discrete expectation is a probability-weighted sum: E[X]=∑ₓ x·P(X=x). Continuous expectation is a probability-weighted integral: E[X]=∫ x f_X(x) dx.
✓
Linearity is the main computational tool: E[X+Y]=E[X]+E[Y] and E[aX+b]=aE[X]+b, without needing independence.
✓
Indicator variables convert probabilities into expectations: if I is 1 on event A and 0 otherwise, then E[I]=P(A).
✓
Expectation generalizes to transformations: E[g(X)] = ∑ g(x)p(x) or ∫ g(x)f(x)dx (often without finding the distribution of g(X)).
✓
E[X] may be infinite or undefined for heavy-tailed distributions; finiteness typically requires ∑ |x|p(x) < ∞ or ∫ |x|f(x)dx < ∞.
✓
Expected value is foundational for variance, entropy, fair games, and optimizing expected loss in machine learning.

Common Mistakes

✗
Thinking E[X] must be a value X can actually take (e.g., expecting a die to roll 3.5).
✗
Forgetting that probabilities must sum/integrate to 1 before computing E[X], leading to incorrect weighted averages.
✗
Assuming E[XY]=E[X]E[Y] without checking independence (linearity does not apply to products).
✗
Ignoring existence: applying expectation formulas to heavy-tailed cases where the sum/integral diverges, producing misleading “answers.”

Practice

easy

A biased coin lands heads with probability p. Let X be the payout where you get $1 for heads and $0 for tails. Compute E[X].

Hint: List the two outcomes (1 and 0) and weight by their probabilities.

Show solution

Outcomes: X=1 with probability p, and X=0 with probability 1−p.

\mathbb{E}[X] = 1\cdot p + 0\cdot (1-p) = p.

So the expected payout is p dollars.

medium

Let X be the result of a fair six-sided die. Define Y = 2X − 1. Compute E[Y] using linearity (do not re-sum from scratch).

Hint: First recall E[X] for a fair die, then apply E[aX+b]=aE[X]+b.

Show solution

For a fair die, $ $\mathbb{E}[X]=3.5.$ $ Using linearity:

\mathbb{E}[Y]=\mathbb{E}[2X-1]=2\mathbb{E}[X]-1=2(3.5)-1=7-1=6.

hard

Optional/advanced: Let X take values 1,2,3,… with probability P(X=k)=c/k^2 for k≥1. (a) Find c. (b) Does E[X] exist as a finite number?

Hint: Use that ∑_{k=1}^∞ 1/k^2 converges (to π²/6). For part (b), examine ∑ k·(c/k²).

Show solution

(a) We need probabilities to sum to 1:

\sum_{k=1}^{\infty} \frac{c}{k^2} = c \sum_{k=1}^{\infty} \frac{1}{k^2} = 1.

Using $ $\sum_{k=1}^{\infty} \frac{1}{k^2} = \frac{\pi^2}{6},$ $ we get

c\cdot \frac{\pi^2}{6}=1 \Rightarrow c=\frac{6}{\pi^2}.

(b) Compute expectation:

\mathbb{E}[X] = \sum_{k=1}^{\infty} k\cdot \frac{c}{k^2} = c\sum_{k=1}^{\infty} \frac{1}{k}.

But ∑_{k=1}^∞ 1/k diverges, so E[X] is infinite (does not exist as a finite number). In this case we say the expectation diverges to +∞.

Connections

Next nodes you can unlock and why they depend on expected value:

•Variance: uses the mean μ = E[X] and defines spread via $ $\mathrm{Var}(X)=\mathbb{E}[(X-\mu)^2].$ $
•Entropy: can be written as an expectation, e.g. discrete $ $H(X)=\mathbb{E}[-\log p(X)].$ $
•Law of Large Numbers: formal statement that the sample mean approaches E[X] under conditions.
•Game Theory Introduction: expected payoff evaluates mixed (randomized) strategies.
•Stochastic Gradient Descent: relies on unbiased gradient estimates, an expectation identity $ $\mathbb{E}[g(\theta)]\approx \nabla R(\theta).$ $

Quality: A (4.5/5)

← back to tree browse all →