Basic measures of central tendency for data sets.
Deep-dive lesson - accessible entry point but dense material. Use worked examples and spaced repetition.
When you summarize a data set with “one number,” you’re really choosing a definition of “typical.” Mean, median, and mode are three different (and useful) ways to say what “typical” means—especially when your data are messy, skewed, or have outliers.
Mean (x̄) averages all values; median is the middle value after sorting; mode is the most frequent value. Mean uses every number but is sensitive to outliers. Median is robust to outliers. Mode captures the most common category/value and can be more than one (or none).
Data sets can be long lists of numbers: test scores, delivery times, heights, or daily temperatures. Looking at a whole list makes it hard to compare groups or describe what’s normal.
A measure of central tendency compresses a data set into a single representative value. But “representative” can mean different things:
These are not interchangeable; they answer different questions.
Suppose we have a data set of n values:
x₁, x₂, …, xₙ
1) Mean (arithmetic mean)
2) Median
3) Mode
If a distribution is perfectly symmetric and has no extreme outliers, mean ≈ median ≈ mode.
But with skew or outliers:
| Measure | What it uses | What it represents | Sensitive to outliers? | Can be multiple? |
|---|---|---|---|---|
| Mean (x̄) | All values | Balance point / average | Yes | No |
| Median | Order only | Middle value | No (very robust) | No |
| Mode | Counts/frequency | Most common value | Not in the same way | Yes (bimodal, multimodal) |
The mean answers: “If I had to replace the whole data set with a single number that preserves the total, what number would that be?”
If you sum all values and then distribute that sum equally across n items, each would get the mean.
For values x₁, x₂, …, xₙ, the sample mean is
x̄ = (1/n) ∑ᵢ₌₁ⁿ xᵢ
Where:
The expression
∑ᵢ₌₁ⁿ xᵢ
means:
x₁ + x₂ + x₃ + … + xₙ
So the mean is:
x̄ = (x₁ + x₂ + … + xₙ) / n
If x̄ is the mean, then
n · x̄ = ∑ᵢ₌₁ⁿ xᵢ
This is a simple but powerful fact: using the mean keeps the sum consistent.
Imagine each data value xᵢ as a weight placed on a number line. The mean is where the system balances.
Another way to say this uses deviations from the mean:
Let dᵢ = xᵢ − x̄.
Then the deviations sum to zero:
∑ᵢ₌₁ⁿ (xᵢ − x̄) = 0
Show the work:
∑ᵢ₌₁ⁿ (xᵢ − x̄)
= ∑ᵢ₌₁ⁿ xᵢ − ∑ᵢ₌₁ⁿ x̄
= ∑ᵢ₌₁ⁿ xᵢ − n·x̄
= ∑ᵢ₌₁ⁿ xᵢ − ∑ᵢ₌₁ⁿ xᵢ
= 0
This is one reason the mean appears everywhere in statistics and machine learning: it’s the “center” that makes positive and negative deviations cancel.
Because the mean uses every value, one extreme point can change it a lot.
Example intuition:
Even if 4 out of 5 values are still 10, the mean shifts dramatically.
Use the mean when:
Be cautious when:
Sometimes “typical” should not be affected by extremes.
Likewise, for categorical data (like shirt sizes), mean may not even make sense—but mode still does.
1) Sort the data from smallest to largest.
2) If n is odd: the median is the single middle value.
3) If n is even: the median is the average of the two middle values.
Let the sorted values be:
x₍₁₎ ≤ x₍₂₎ ≤ … ≤ x₍ₙ₎
Median depends on order, not the actual magnitudes beyond their relative position.
So if you change an extreme value (like the max) but it remains at the far end, the median often stays the same.
Because median is determined by the middle position(s), outliers have limited influence.
Example idea:
Use the median when:
The mode is the value (or values) that occur most frequently.
Mode answers: “What is the most common outcome?”
This is especially useful for:
Mode can be unstable if:
In practice, for continuous data we often bin values (histograms) and talk about the most common range rather than the exact value.
A common pattern in right-skewed data (long tail to the right) is:
mode ≤ median ≤ mean
Because the mean is pulled toward the tail.
In left-skewed data:
mean ≤ median ≤ mode
These are not laws, but good intuition checks.
In practice, computing mean/median/mode is easy. The hard part is choosing the right one for the question and the data.
Often teams report both:
For categorical data like “small/medium/large,” mean is not meaningful.
| If your data look like… | And you care about… | Prefer… |
|---|---|---|
| Symmetric, few outliers | Overall average level | Mean (x̄) |
| Skewed or has big outliers | Typical central experience | Median |
| Categories or repeated discrete values | Most common outcome | Mode |
| You want a full picture | Different notions of “typical” | Report mean + median (and sometimes mode) |
Mean, median, and mode are the first step toward:
If you remember one guiding idea: mean uses magnitudes, median uses order, mode uses counts.
Data (minutes to finish a task): 5, 7, 7, 9, 12
Mean (x̄):
Compute the sum using ∑:
∑ᵢ₌₁⁵ xᵢ = 5 + 7 + 7 + 9 + 12 = 40
Then divide by n = 5:
x̄ = (1/5)·40 = 8
Median:
The data are already sorted: 5, 7, 7, 9, 12
n = 5 is odd, so the middle position is (n+1)/2 = 3
Median = the 3rd value = 7
Mode:
Count frequencies:
5 occurs 1 time
7 occurs 2 times
9 occurs 1 time
12 occurs 1 time
Mode = 7
Insight: Here the mean (8) is higher than the median (7) because the larger values (9, 12) pull the average upward a bit. The mode matches the median because 7 is both common and centrally located.
Original data: 10, 11, 11, 12, 13
Add an outlier: 100
Original mean:
Sum = 10 + 11 + 11 + 12 + 13 = 57
n = 5
x̄ = 57/5 = 11.4
Original median:
Sorted data are the same.
Middle value (3rd) = 11
Median = 11
New data set with outlier: 10, 11, 11, 12, 13, 100
New mean:
New sum = 57 + 100 = 157
n = 6
x̄ = 157/6 ≈ 26.1667
New median:
With n = 6 (even), median is the average of the 3rd and 4th values.
3rd value = 11, 4th value = 12
Median = (11 + 12)/2 = 11.5
Insight: One outlier changed the mean from 11.4 to about 26.17 (a huge shift), but the median moved only from 11 to 11.5. This is why median is often preferred for skewed data.
Data (number of messages received per day for 8 days): 2, 2, 3, 3, 4, 5, 5, 7
Count frequencies:
2 occurs 2 times
3 occurs 2 times
4 occurs 1 time
5 occurs 2 times
7 occurs 1 time
Identify the maximum frequency:
The highest count is 2
List all values with that frequency:
Mode(s) = 2, 3, 5 (multimodal)
Insight: Mode isn’t always a single number. When multiple values tie for highest frequency, the data can be multimodal—useful for spotting multiple common behaviors.
Mean (x̄) is the arithmetic average: x̄ = (1/n) ∑ᵢ₌₁ⁿ xᵢ.
Median is the middle value after sorting (or the average of the two middle values when n is even).
Mode is the most frequent value; it can be one value, multiple values, or (in some cases) none.
Mean uses magnitudes and is sensitive to outliers; median uses order and is robust to outliers.
Mode is especially useful for categorical data where mean/median may not be meaningful.
For right-skewed data, a common pattern is mode ≤ median ≤ mean (mean pulled right).
A good summary often reports more than one measure (e.g., mean and median together).
Computing the median without sorting the data first.
Forgetting that when n is even, the median is the average of the two middle values (not one of them).
Assuming the mode must be unique; ties can create multiple modes.
Using the mean as “typical” for highly skewed data and getting a misleading result.
Compute mean, median, and mode for: 1, 2, 2, 2, 9
Hint: Mean uses the sum; median is the 3rd value after sorting (since n = 5); mode is the most frequent.
Sorted data: 1, 2, 2, 2, 9
Mean: (1+2+2+2+9)/5 = 16/5 = 3.2
Median: 3rd value = 2
Mode: 2
Find the median of: 4, 10, 6, 8
Hint: Sort, then average the two middle values because n is even.
Sorted: 4, 6, 8, 10
Median = (6 + 8)/2 = 7
A data set has mean x̄ = 12 for n = 5 values. The first four values are 10, 12, 12, 14. Find the fifth value.
Hint: Use n·x̄ = ∑ xᵢ to get the total sum, then subtract the known values.
Total sum = n·x̄ = 5·12 = 60
Known sum = 10 + 12 + 12 + 14 = 48
Fifth value = 60 − 48 = 12
Next concepts you can unlock from here: