Range of plausible values for population parameter.
Quick unlock - significant prerequisite investment but simple final step. Verify prerequisites first.
Point estimates (like θ̂) give you a single best guess. Confidence intervals add the missing piece: how uncertain that guess is, using a principled link between your data, your estimator’s sampling variation, and probability.
A (1 − α) confidence interval (CI) is a random interval computed from data that contains the fixed true parameter θ with probability 1 − α over repeated sampling. Most common CIs look like
θ̂ ± (critical value) · se(θ̂)
where se(θ̂) measures typical sampling error. Interpretation is about the procedure’s long-run coverage, not the probability that θ lies in a specific realized interval.
Suppose you estimate a population parameter θ (a mean, a proportion, a regression coefficient) using a point estimator θ̂ computed from sample data. Even if θ̂ is “good” (unbiased or consistent), it will vary from sample to sample.
If we only report θ̂, we hide an essential fact: a different random sample would yield a different estimate. A confidence interval is an interval estimate: a data-derived range intended to contain θ.
A common beginner intuition is: “After I compute an interval, there’s a 95% probability that θ is inside.” That’s not the frequentist definition.
In frequentist statistics, θ is a fixed (unknown) constant. Randomness comes from the sampling process. So a confidence interval is a random interval [L(X), U(X)] computed from random data X.
A (1 − α) CI is a procedure that satisfies the coverage property:
P( L(X) ≤ θ ≤ U(X) ) = 1 − α
The probability is over repeated sampling of X under the assumed model.
Imagine repeating the same experiment many times, each time computing a 95% CI using the same method. About 95% of those intervals will contain the true θ. About 5% will miss it.
Once you have computed one interval from your one dataset, the interval is no longer random; it’s a fixed numerical range. The method used to compute it is what carries the 95% guarantee.
They sound similar, but condition on different things. Confidence intervals do not, by default, assign probabilities to θ.
Most classical CIs follow a common template:
θ̂ ± (critical value) · se(θ̂)
This is the “margin of error” idea made precise.
The template is a consequence of: (1) a sampling distribution for θ̂, often approximately Normal via the Central Limit Theorem (CLT), and (2) a pivot or standardized statistic whose distribution we can control.
You already know the CLT: sums/averages of many random variables tend toward a Normal distribution. Confidence intervals are where that knowledge becomes actionable.
A confidence interval is a statement about what happens across repeated samples. So we need the distribution of θ̂ across those repeated samples: the sampling distribution.
If θ̂ has a distribution centered at θ with some spread, we can pick an interval around θ̂ that captures θ most of the time.
The standard error se(θ̂) is the standard deviation of θ̂’s sampling distribution:
se(θ̂) = √Var(θ̂)
You can read it as: “How much θ̂ typically wiggles due to sampling.”
Crucially:
Let X₁, …, Xₙ be i.i.d. with mean μ and variance σ². The sample mean is
μ̂ = X̄ = (1/n) ∑ᵢ Xᵢ
Then
E[X̄] = μ
Var(X̄) = σ² / n
se(X̄) = σ / √n
So uncertainty shrinks like 1/√n. This is why sample size helps, but with diminishing returns.
For large n, the CLT says
(X̄ − μ) / (σ/√n) ≈ Normal(0, 1)
Equivalently,
X̄ ≈ Normal( μ, σ²/n )
This approximation is the workhorse behind many “Wald-type” confidence intervals.
In real problems, σ is usually unknown. We estimate it with the sample standard deviation s.
When data are Normal, we have an exact result:
T = (X̄ − μ) / (s/√n) ∼ t_{n−1}
That t distribution is wider than Normal(0,1), reflecting the extra uncertainty from estimating σ.
Often, for an estimator θ̂ (like an MLE), we have an asymptotic Normal approximation:
θ̂ ≈ Normal( θ, Var(θ̂) )
Then a standardized version
Z = (θ̂ − θ) / se(θ̂)
is approximately Normal(0,1). This is the bridge from sampling distribution to an interval.
To construct a CI, you typically need:
1) A point estimator θ̂
2) An estimate of its standard error se(θ̂)
3) A critical value c such that
P(−c ≤ Z ≤ c) = 1 − α
Then solve the inequality for θ.
If you remember only one operational fact:
A CI width is controlled primarily by se(θ̂).
Everything else (Normal vs t, exact vs approximate, 90% vs 95%) is a multiplier on that base uncertainty. If your SE estimate is poor, your CI will be misleading.
To build an interval, we want a statistic with a known distribution that does not depend on the unknown θ (or depends on it in a controllable way). Such a statistic is called a pivot.
Once we have a pivot, we can take quantiles of its distribution and algebraically rearrange to isolate θ.
Assume we have an estimator θ̂ such that
Z = (θ̂ − θ) / se(θ̂) ≈ Normal(0, 1)
Let z_{1−α/2} be the (1 − α/2) quantile of the standard Normal. Then
P( −z_{1−α/2} ≤ Z ≤ z_{1−α/2} ) ≈ 1 − α
Substitute Z:
P( −z_{1−α/2} ≤ (θ̂ − θ)/se(θ̂) ≤ z_{1−α/2} ) ≈ 1 − α
Now solve for θ. Multiply through by se(θ̂) (positive):
P( −z_{1−α/2}·se(θ̂) ≤ θ̂ − θ ≤ z_{1−α/2}·se(θ̂) ) ≈ 1 − α
Rearrange terms:
Left inequality:
θ̂ − θ ≥ −z·se
⇒ −θ ≥ −z·se − θ̂
⇒ θ ≤ θ̂ + z·se
Right inequality:
θ̂ − θ ≤ z·se
⇒ −θ ≤ z·se − θ̂
⇒ θ ≥ θ̂ − z·se
Combine:
P( θ̂ − z_{1−α/2}·se(θ̂) ≤ θ ≤ θ̂ + z_{1−α/2}·se(θ̂) ) ≈ 1 − α
So the (approximate) (1 − α) CI is
[ θ̂ − z_{1−α/2}·se(θ̂), θ̂ + z_{1−α/2}·se(θ̂) ]
This is the famous “estimate ± margin of error.”
If your pivot is t-distributed (common for means with unknown σ under Normal data), you replace z_{1−α/2} with t_{n−1, 1−α/2}.
Common two-sided critical values:
| Confidence | α | z_{1−α/2} (approx) |
|---|---|---|
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.96 |
| 99% | 0.01 | 2.576 |
Higher confidence means a larger multiplier, hence wider intervals.
Sometimes you only care about an upper bound or lower bound.
For an upper (1 − α) bound, you want
P( Z ≤ z_{1−α} ) = 1 − α
leading to an upper confidence bound:
θ ≤ θ̂ + z_{1−α}·se(θ̂)
Similarly for a lower bound.
The Wald CI is simple, but it can behave poorly when:
This is why you will later see intervals like Wilson score intervals (for proportions), likelihood-based intervals, or bootstrap intervals.
Still, the pivot logic remains the same: find (or approximate) a statistic with a known distribution, then invert it.
| Method | Form | Needs | Pros | Cons |
|---|---|---|---|---|
| z/Wald CI | θ̂ ± z·se | se(θ̂), Normal approx | Simple, general | Can mis-cover in small n / boundaries |
| t CI (mean) | X̄ ± t·(s/√n) | Normal data (or robust large n) | Better small-sample than z | Still assumes reasonably symmetric behavior |
| Likelihood-based | {θ: 2(ℓ(θ̂)−ℓ(θ)) ≤ c} | Log-likelihood ℓ(θ) | Often better shape/constraints | More computation, theory |
| Bootstrap | quantiles of θ̂* | resampling procedure | Flexible | Can fail for dependent data or weird estimators |
For this node, the main goal is to master what a CI means and how the standard “critical value × SE” construction arises.
Confidence intervals are a lingua franca for uncertainty.
They let you:
A two-sided (1 − α) CI for θ corresponds to a two-sided hypothesis test at level α:
H₀: θ = θ₀
Rule:
Why this works: both procedures are built from the same pivot and critical region.
Suppose your CI has half-width (margin of error) m:
m = z_{1−α/2} · se(θ̂)
For a mean with known σ:
m = z · (σ/√n)
Solve for n to achieve desired precision m:
m = zσ/√n
⇒ √n = zσ/m
⇒ n = (zσ/m)²
This turns “How many samples do we need?” into a quantitative design question.
Given a 95% CI [L, U], it’s correct to say:
Be cautious with:
CI width increases when:
CI width decreases when:
For many MLEs θ̂, under regularity conditions:
θ̂ ≈ Normal( θ, I(θ)^{-1} )
where I(θ) is Fisher information (or n times per-sample information). In practice we plug in θ̂:
se(θ̂) ≈ √( I(θ̂)^{-1} )
Then a Wald CI becomes
θ̂ ± z_{1−α/2} · √( I(θ̂)^{-1} )
This is one of the most common “automatic CI” outputs in statistical software.
A CI is often best treated as the set of parameter values that are reasonably compatible with your observed data under the assumed model.
This mindset helps you avoid the trap of treating 0.049 vs 0.051 p-values as fundamentally different: an interval shows a continuum of plausible effects.
Coverage (the 95% guarantee) depends on the assumptions used to derive se(θ̂) and the pivot distribution:
If those fail, your interval can under-cover (too narrow) or over-cover (too wide).
You measure a quantity in n = 16 independent trials. The sample mean is X̄ = 12.4 and the sample standard deviation is s = 2.0. Assume the data are approximately Normal. Construct a 95% CI for μ.
Identify the estimator and its SE:
μ̂ = X̄ = 12.4
se(μ̂) = s/√n = 2.0/√16 = 2.0/4 = 0.5
Choose the correct critical value:
Because σ is unknown and we assume Normal data, use a t distribution with df = n − 1 = 15.
For a 95% two-sided interval, use t_{15, 0.975} ≈ 2.131.
Compute the margin of error:
ME = t · se = 2.131 · 0.5 = 1.0655
Form the interval:
Lower = 12.4 − 1.0655 = 11.3345
Upper = 12.4 + 1.0655 = 13.4655
So the 95% CI is approximately [11.33, 13.47].
Insight: The interval width is driven by se = s/√n. If you quadruple n, √n doubles and the CI half-width roughly halves (holding s similar). The switch from z to t is a small-sample correction for estimating σ.
In n = 400 trials, you observe x = 92 successes. Estimate the population success probability p and compute an approximate 95% CI using the Wald method.
Compute the point estimate:
The MLE for p is p̂ = x/n = 92/400 = 0.23
Compute the estimated standard error:
For a Bernoulli proportion,
Var(p̂) = p(1−p)/n
Plug in p̂ for p:
se(p̂) ≈ √( p̂(1−p̂)/n )
= √( 0.23·0.77 / 400 )
= √( 0.1771 / 400 )
= √( 0.00044275 )
≈ 0.02105
Use z critical value for 95%:
z_{0.975} ≈ 1.96
Compute margin of error:
ME = 1.96 · 0.02105 ≈ 0.0413
Form the interval:
Lower = 0.23 − 0.0413 = 0.1887
Upper = 0.23 + 0.0413 = 0.2713
So the 95% Wald CI is approximately [0.189, 0.271].
Insight: This works well here because n is large and p̂ is not near 0 or 1. For smaller n or extreme proportions, Wald intervals can misbehave (even going below 0 or above 1), motivating better alternatives like Wilson or likelihood-based intervals.
You want a 95% CI for a mean μ with margin of error at most m = 0.5. Prior studies suggest σ ≈ 3. How large should n be (approx)?
Start from the margin of error formula (Normal approximation):
m = z_{0.975} · (σ/√n)
Solve for n:
0.5 = 1.96 · (3/√n)
⇒ 0.5 = 5.88 / √n
⇒ √n = 5.88 / 0.5 = 11.76
⇒ n = (11.76)² = 138.2976
Round up to ensure the margin of error target:
Choose n = 139 (or 140 for convenience).
Insight: Precision requirements translate into quadratic sample size growth: halving the margin of error requires roughly 4× as many samples. This is a practical consequence of the 1/√n rate from the CLT.
A (1 − α) confidence interval is a random interval procedure with coverage P(L(X) ≤ θ ≤ U(X)) = 1 − α over repeated samples.
θ is fixed (unknown); the interval is random before you see data. After computing it, the interval is fixed but the method has a long-run guarantee.
Most common CIs take the form θ̂ ± (critical value) · se(θ̂). The SE controls the basic width.
The CLT and asymptotic Normality (often for MLEs) justify many practical CIs via the pivot Z = (θ̂ − θ)/se(θ̂).
Use t critical values for mean CIs when σ is unknown (especially in small samples under Normal assumptions).
Higher confidence levels widen intervals; larger n narrows them at rate 1/√n.
Two-sided (1 − α) CIs correspond to two-sided hypothesis tests at level α: reject θ = θ₀ iff θ₀ is outside the CI.
Coverage depends on assumptions (independence, correct model, adequate sample size). Bad assumptions can lead to under-coverage (too-narrow intervals).
Interpreting a 95% CI as “there is a 95% probability that θ is in this particular interval” (that’s not the frequentist meaning).
Confusing SD with SE: SD is variability in data; SE is variability in an estimator across samples.
Using z = 1.96 automatically in small samples for means with unknown σ (should often use t with df = n − 1).
Trusting Wald intervals near boundaries (e.g., proportions near 0 or 1) or with small n, where coverage can be poor.
You have n = 25 observations with X̄ = 80 and s = 10. Assuming approximate Normality, compute a 95% CI for μ.
Hint: Use a t interval: X̄ ± t_{n−1,0.975} · s/√n with df = 24.
se = s/√n = 10/√25 = 10/5 = 2.
Critical value: t_{24,0.975} ≈ 2.064.
ME = 2.064 · 2 = 4.128.
CI = 80 ± 4.128 = [75.872, 84.128] ≈ [75.87, 84.13].
In a survey of n = 200 people, x = 30 respond “yes.” Compute an approximate 90% Wald CI for the proportion p.
Hint: p̂ = x/n. Use z_{0.95} ≈ 1.645 and se(p̂) ≈ √(p̂(1−p̂)/n).
p̂ = 30/200 = 0.15.
se ≈ √(0.15·0.85/200) = √(0.1275/200) = √(0.0006375) ≈ 0.02525.
ME = 1.645 · 0.02525 ≈ 0.0415.
90% CI ≈ 0.15 ± 0.0415 = [0.1085, 0.1915] ≈ [0.109, 0.192].
A parameter estimator θ̂ is approximately Normal with se(θ̂) = 0.8. (a) Give a 95% Wald CI in terms of θ̂. (b) What happens to the CI width if you want 99% instead?
Hint: Use θ̂ ± z·se with z_{0.975} ≈ 1.96 and z_{0.995} ≈ 2.576. Compare margins of error.
(a) 95% CI: θ̂ ± 1.96·0.8 = θ̂ ± 1.568, i.e., [θ̂ − 1.568, θ̂ + 1.568].
(b) 99% CI: θ̂ ± 2.576·0.8 = θ̂ ± 2.0608. The half-width increases from 1.568 to 2.061 (about 31% wider), because the critical value is larger.