Using randomness for efficiency. Las Vegas vs Monte Carlo.
Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.
Deterministic algorithms commit to one path. Randomized algorithms flip coins to explore many possible paths—often making the algorithm dramatically simpler or faster. The price is that the algorithm’s behavior becomes a distribution: its running time and/or its correctness depends on random bits.
A randomized algorithm is A(x; r): on the same input x, different random bits r can change the execution. Las Vegas algorithms are always correct but have random running time (analyzed via E[T]). Monte Carlo algorithms have a fixed time bound but may err with probability ≤ δ. You can reduce Monte Carlo error exponentially by repeating independently and aggregating (amplification), while cost grows only linearly.
In many algorithmic problems, the hardest part is worst-case structure: an adversarial input can force a deterministic algorithm to take a very slow or very complicated path.
Randomness is a way to avoid being “predictable.” Instead of choosing a specific pivot, a specific sample, or a specific hash function deterministically, we choose it at random. This can:
Randomized algorithms appear everywhere: quicksort with random pivots, randomized hashing, randomized load balancing, randomized primality testing, and many streaming/sketching methods.
A deterministic algorithm is a function A(x) that maps an input x to an output y.
A randomized algorithm is better viewed as a function of two inputs:
We write:
For a fixed input x, as r varies uniformly over {0,1}ᵏ (for however many bits are used), the output and runtime become random variables.
Let:
Then for a fixed x:
We use the probability operator Pr[·] to indicate probability over the algorithm’s random bits:
The key point: randomness moves us from single-value statements to probabilistic guarantees.
Randomized algorithms are typically analyzed by answering one (or both) of these questions:
Different randomized algorithms put the “randomness” on different sides (runtime vs correctness). That’s exactly what the Las Vegas vs Monte Carlo distinction captures.
Another helpful viewpoint is:
Each random string r selects a deterministic behavior Aᵣ(x) = A(x; r). Then analysis is about the average (or tail) behavior over r.
This viewpoint is extremely useful when you reason about adversaries: if an adversary fixes x first, and then you sample r, you can often guarantee good performance in expectation over r (even if some r are “bad”).
Here’s the core distinction we’ll build up carefully:
| Type | Time bound | Correctness | Typical analysis |
|---|---|---|---|
| Las Vegas | Random variable | Always correct | E[T], or Pr[T ≤ …] |
| Monte Carlo | Deterministic bound (or high-probability bound) | Error ≤ δ | Pr[error] ≤ δ, amplification |
We’ll now study each category in depth and learn how amplification turns “somewhat reliable” Monte Carlo answers into “overwhelmingly reliable” ones.
Not all randomness is used for the same purpose.
Those are qualitatively different tradeoffs, so we name them.
A Las Vegas randomized algorithm satisfies:
So the algorithm might sometimes take longer, but it never lies.
The most common guarantee is expected time:
where n = |x| is input size.
Sometimes we also want tail bounds, e.g.:
But expectation is the classic baseline.
The key property is that the algorithm can verify correctness of what it outputs, or it’s built so it cannot output an incorrect answer.
A Monte Carlo randomized algorithm satisfies:
Equivalently:
Here δ is an error probability you can often tune.
Monte Carlo algorithms come in two flavors:
One-sided error is often easier to amplify because the “safe” answer is always trustworthy.
| Feature | Las Vegas | Monte Carlo |
|---|---|---|
| Output correctness | Always correct | Correct w.p. ≥ 1 − δ |
| Runtime | Random variable | Usually fixed upper bound |
| Main knob | Expected time | Error probability δ |
| Typical use when… | You can verify correctness or ensure correctness structurally | Verification is expensive; approximate/decision is OK |
Let’s write this with the node’s symbols.
For a randomized algorithm A(x; r):
For all x:
and the analysis is about E[T(x; r)].
For all x:
The analysis is about δ.
Sometimes people casually say: “This randomized algorithm runs in O(n) time with high probability.” That statement alone doesn’t classify the algorithm.
To classify it, you must ask:
It’s possible to have an algorithm where both runtime and correctness are probabilistic. In practice, we often convert such algorithms into one of the two standard forms by enforcing timeouts (turning it into Monte Carlo) or repeating-until-success with verification (turning it into Las Vegas).
You now have the taxonomy. Next we’ll learn the most powerful tool for Monte Carlo algorithms: amplification.
A Monte Carlo algorithm typically gives a guarantee like:
At first, 1/3 sounds uncomfortably large. But this is the key idea:
If you can run the algorithm multiple times with independent random bits, you can drive the error probability down exponentially while increasing time only linearly.
That tradeoff is one of the main reasons Monte Carlo algorithms are so useful.
Assume we have a Monte Carlo algorithm A(x; r) that outputs a decision bit (YES/NO).
Let one run have error probability:
Usually p < 1/2 (e.g., p = 1/3).
We run A independently k times using fresh random bits r₁, r₂, …, r_k.
Let the outputs be:
Define an aggregated answer Z*.
Two common aggregations:
If each run is correct with probability ≥ 1 − p and p < 1/2, then the majority of k runs is wrong only if more than half the runs are wrong.
Let Xᵢ be the indicator random variable that run i is wrong:
Then:
Majority vote fails if:
A standard result (Chernoff/Hoeffding bounds) implies:
So to get error ≤ δ, it suffices to take:
Even without proving the exact bound here, the intuition is strong: if each run is biased toward correctness, the probability that most runs are wrong drops exponentially.
Suppose A has one-sided error of this form:
Then you can amplify by repeating and taking OR:
Why does this work?
If each run fails with probability ≤ p, and runs are independent:
So error becomes pᵏ.
To achieve error ≤ δ:
Solve:
pᵏ ≤ δ
⇒ k · log p ≤ log δ
⇒ k ≥ log(δ) / log(p)
Because 0 < p < 1, log(p) is negative, so this is:
Again, k = O(log(1/δ)).
If one run costs time O(f(n)), then k repetitions cost:
If k = O(log(1/δ)), then total time is:
This is the core tradeoff:
Amplification relies on the runs being independent (or close enough).
If you reuse the same random bits, you can get perfectly correlated outcomes:
So when we say “repeat independently,” we mean:
Algorithm papers often present a Monte Carlo algorithm with a “constant error” like 1/3 because:
A common target is δ = 1/nᶜ for some constant c, so that a union bound across many algorithmic events still yields high overall success probability.
| Error type | Single-run guarantee | Repeat k times | Combine outputs | New error |
|---|---|---|---|---|
| One-sided (false negatives only) | Pr[miss] ≤ p | independent runs | OR | ≤ pᵏ |
| One-sided (false positives only) | Pr[false alarm] ≤ p | independent runs | AND | ≤ pᵏ |
| Two-sided | Pr[wrong] ≤ p < 1/2 | independent runs | Majority | ≤ exp(−Ω(k)) |
Amplification is the main “engineering knob” for Monte Carlo algorithms. Next we’ll connect these ideas to real algorithmic uses and how to reason about expected time vs bounded time.
Randomness is not just a trick—it’s a systematic way to obtain:
But using randomness responsibly means being precise about what is guaranteed.
Example: choosing a pivot randomly in quicksort.
This yields a Las Vegas algorithm:
Example: randomized hashing.
Instead of a fixed hash function that an adversary can exploit, select a hash function at random from a family.
Then performance guarantees are in probability over the choice of hash function (which is effectively r).
This connects to the “distribution over deterministic algorithms” view.
Example: primality testing.
Many streaming algorithms rely on random sampling to estimate counts, norms, or heavy hitters.
These are often Monte Carlo: they provide approximate answers with high probability.
When you see a claim, categorize it:
A good habit is to explicitly write:
This makes it crystal clear what is random and what’s guaranteed.
If you can verify an answer efficiently, you can do:
Then correctness becomes 1, and time becomes random.
This is a standard way to construct Las Vegas algorithms: random proposal + deterministic check.
If a Las Vegas algorithm has a heavy tail (rare long runs), you can:
Now you have bounded time, but a chance of failure/error.
This can be useful in real-time systems.
In theory, r is perfectly random. In practice:
Most randomized algorithm analyses assume truly independent bits; engineering approximates that assumption well enough for most applications.
Randomization is a gateway to several deeper topics:
The core conceptual toolkit you need is:
Once those are solid, the rest is about applying them in specific domains.
We sort n distinct numbers using randomized quicksort: pick a pivot uniformly at random from the current subarray, partition, then recurse. The algorithm is always correct, but the runtime depends on pivot choices.
Goal: show the expected number of comparisons is O(n log n).
Define the random variable C = total number of element-to-element comparisons performed by quicksort on the input of size n.
Key idea: count comparisons by pairs.
For each pair of elements (i, j) with i < j in sorted order, define an indicator variable:
Xᵢⱼ = 1 if elements i and j are ever compared, else 0.
Then:
C = ∑_{i<j} Xᵢⱼ
Take expectations:
E[C] = E[ ∑_{i<j} Xᵢⱼ ]
= ∑_{i<j} E[Xᵢⱼ]
= ∑_{i<j} Pr[Xᵢⱼ = 1]
Compute Pr[Xᵢⱼ = 1].
In quicksort, i and j are compared iff the first pivot chosen from the set {i, i+1, …, j} is either i or j.
Because the pivot is chosen uniformly at random among the elements in that set when that subproblem is first formed:
Pr[Xᵢⱼ = 1] = 2 / (j − i + 1).
Plug back in:
E[C] = ∑_{i<j} 2 / (j − i + 1)
Let d = j − i (distance in sorted order), so j − i + 1 = d + 1.
For each d ∈ {1, …, n−1}, there are (n − d) pairs at distance d.
So:
E[C] = ∑_{d=1}^{n−1} (n − d) · 2/(d + 1)
Upper bound it:
E[C] = 2 · ∑_{d=1}^{n−1} (n − d)/(d + 1)
≤ 2 · ∑_{d=1}^{n−1} n/(d + 1)
= 2n · ∑_{d=1}^{n−1} 1/(d + 1)
= 2n · ∑_{k=2}^{n} 1/k
Use the harmonic series estimate:
∑_{k=2}^{n} 1/k = O(log n)
Therefore:
E[C] = O(n log n).
Insight: This is quintessential Las Vegas analysis: correctness is 1, but performance is random. Instead of tracking complex recursion shapes, we used linearity of expectation on carefully chosen indicator variables.
Suppose A(x; r) runs in time O(f(n)) and returns a YES/NO answer with two-sided error:
Pr[A is correct] ≥ 2/3.
We want a new algorithm with error probability ≤ 2⁻²⁰, and we want to compute how many repetitions are needed.
Let p = Pr[A is wrong] ≤ 1/3 for a single run.
Run A independently k times to get outputs Z₁, …, Z_k.
Return the majority vote Z*.
Majority is wrong only if at least ⌈k/2⌉ runs are wrong.
Let Xᵢ indicate whether run i is wrong.
Then S = ∑ᵢ Xᵢ is the number of wrong runs.
We have E[Xᵢ] ≤ 1/3, so E[S] ≤ k/3.
We want Pr[S ≥ k/2].
This is a large deviation above the mean (k/2 is bigger than k/3). Chernoff bounds imply:
Pr[S ≥ k/2] ≤ exp(−Ω(k)).
A common concrete form for p = 1/3 yields:
Pr[majority wrong] ≤ exp(−k/18)
(you do not need to memorize the constant; the exponential-in-k behavior is the key).
Set exp(−k/18) ≤ 2⁻²⁰.
Take natural logs:
−k/18 ≤ ln(2⁻²⁰) = −20 ln 2
⇒ k/18 ≥ 20 ln 2
⇒ k ≥ 360 ln 2
Compute 360 ln 2 ≈ 360 · 0.693 ≈ 249.5.
So k = 250 repetitions suffice under this bound.
Total time becomes O(k · f(n)) = O(f(n) · 250), i.e. only a constant-factor slowdown for astronomically small error.
Insight: Amplification converts a constant-bias coin (2/3 correct) into an extremely reliable decision procedure. The time cost is linear in repetitions, while error shrinks exponentially—one of the main reasons Monte Carlo algorithms are practical.
Assume a one-sided Monte Carlo algorithm for a YES/NO property:
We want overall miss probability ≤ δ = 10⁻⁶.
Repeat the algorithm independently k times.
Output YES if any run outputs YES (OR rule).
On NO instances, the algorithm never outputs YES, so the amplified algorithm is still never wrong on NO instances (error = 0 there).
On YES instances, error occurs only if all k runs miss:
Pr[error] ≤ pᵏ = (1/4)ᵏ.
Solve (1/4)ᵏ ≤ 10⁻⁶.
Take logs base 10:
k · log₁₀(1/4) ≤ −6.
Since log₁₀(1/4) = −log₁₀(4) ≈ −0.60206:
−0.60206 k ≤ −6
⇒ k ≥ 6 / 0.60206 ≈ 9.966.
So k = 10 repetitions suffice.
Runtime increases by a factor of 10; error drops to ≤ 10⁻⁶.
Insight: For one-sided error, amplification is especially clean: independence plus an OR/AND rule gives error pᵏ exactly (or at least ≤ pᵏ). This is often used in randomized algebra and primality testing.
A randomized algorithm is naturally written as A(x; r): random bits r make output and/or runtime into random variables.
All probabilistic statements are taken over the internal randomness: Pr_r[·].
Las Vegas algorithms: always correct (Pr[correct] = 1) but have randomized running time; analyze E[T] and sometimes tail bounds.
Monte Carlo algorithms: have a fixed (or tightly bounded) runtime but may be wrong with probability δ; analyze Pr[error] ≤ δ.
Monte Carlo error comes in one-sided and two-sided forms; one-sided error often amplifies via simple OR/AND rules.
Amplification: k independent repetitions reduce error exponentially (≈ pᵏ for one-sided; exp(−Ω(k)) for majority vote), while time increases linearly by a factor k.
Independence of repetitions (fresh random bits) is essential—without it, amplification may provide no benefit.
In practice, many algorithms are presented with constant error (like 1/3) because you can tune δ to be as small as needed with O(log(1/δ)) repetitions.
Confusing Las Vegas and Monte Carlo: Las Vegas never returns an incorrect answer; Monte Carlo may, even if the error is tiny.
Assuming amplification works without independence (e.g., reusing the same seed or correlated randomness).
Treating an expected-time bound E[T] ≤ O(f(n)) as if it were a worst-case bound T ≤ O(f(n)) for every run.
Ignoring what the probability is taken over: Pr[·] here is over random bits r, not over a distribution of inputs unless explicitly stated.
Classification practice. For each statement, decide whether it describes (i) Las Vegas, (ii) Monte Carlo, or (iii) insufficient information.
A) “Always outputs a correct minimum spanning tree; expected runtime O(m log n).”
B) “Runs in O(n²) time and outputs the correct answer with probability at least 0.99.”
C) “Runs in expected O(n) time and is correct with probability 0.999.”
Hint: Las Vegas: correctness is 1. Monte Carlo: runtime is bounded but correctness < 1. If both are probabilistic, you may not have enough to classify cleanly without more detail.
A) Las Vegas (correctness is guaranteed; time analyzed in expectation).
B) Monte Carlo (fixed time bound; probabilistic correctness).
C) Insufficient information / mixed: correctness is not guaranteed (so not Las Vegas), and runtime is only in expectation (so not the clean Monte Carlo template). It’s a randomized algorithm with both time and correctness probabilistic unless more structure is given.
One-sided amplification calculation. A one-sided Monte Carlo algorithm has false-negative probability p = 0.1 (it may miss YES instances), and never has false positives. How many independent repetitions k are needed so that the amplified algorithm (OR of runs) has miss probability ≤ 10⁻⁹?
Hint: For OR amplification with one-sided error: Pr[miss] ≤ pᵏ. Solve pᵏ ≤ δ using logs.
We need (0.1)ᵏ ≤ 10⁻⁹.
Taking log base 10:
k · log₁₀(0.1) ≤ −9.
But log₁₀(0.1) = −1.
So −k ≤ −9 ⇒ k ≥ 9.
Answer: k = 9 repetitions.
Design choice. You have a Monte Carlo algorithm that runs in time O(n) and returns the correct answer with probability 2/3. You need overall error probability ≤ 1/n³.
Assuming independent repetitions and majority vote, give an asymptotic bound on the total runtime after amplification in terms of n.
Hint: You want δ = 1/n³. Majority-vote error drops like exp(−Ω(k)), so k = O(log(1/δ)) = O(log n). Multiply by O(n).
Target error δ = 1/n³.
We choose k = O(log(1/δ)) repetitions.
Compute:
log(1/δ) = log(n³) = 3 log n = O(log n).
Each run costs O(n), so total time is:
O(n · k) = O(n log n).
(Any constant factors depend on the specific Chernoff bound constants, but asymptotically it is O(n log n).)