O(log n) search in sorted array. Divide and conquer.
Self-serve tutorial - low prerequisites, straightforward concepts.
You have a sorted list of a million items and need to know whether a value is present. Scanning from left to right works—but it wastes the fact that the list is sorted. Binary search is the classic way to turn “sorted order” into a dramatic speedup: from O(n) down to O(log n).
Binary search repeatedly halves a sorted array’s remaining search interval. Maintain an interval [low, high] that always contains the target’s index if it exists. Check mid; if mid is too small, move low up; if too large, move high down. This takes O(log n) time and O(1) extra space.
You’re given an array A and a target value. You want to find the index of target (or decide it isn’t present).
target or reach the end. That’s O(n).Binary search is a divide-and-conquer algorithm for searching in a sorted array. The key idea: each comparison lets you discard half of the remaining candidates.
Assume the array is sorted in nondecreasing order:
Nondecreasing allows duplicates (e.g., [1, 2, 2, 2, 5]). Binary search still works, but you must be clear on what you want when duplicates exist (any occurrence? first? last?). We’ll cover variants later.
Binary search maintains a search interval of indices where the target could still be.
low = lowest index still possiblehigh = highest index still possibleAt all times, you try to preserve this loop invariant:
Invariant: Iftargetappears in the array, then its index lies within[low, high].
Each step picks the middle index mid, compares A[mid] to target, and throws away the half that can’t contain the target.
Before you code binary search confidently, you should be comfortable with:
mid using floor behavior (e.g., (low + high) // 2).[low, high] each iteration.These aren’t “extra topics”—they are exactly where most binary search bugs come from.
Each iteration cuts the interval size roughly in half.
If the array length is n:
We stop when the interval becomes empty or we find the target. The smallest k such that n/2ᵏ ≤ 1 is k ≥ log₂(n). So the number of iterations is O(log n).
Binary search isn’t just “check middle and move left/right.” It’s a careful promise about what indices are still possible.
If you lose that promise (the invariant), you can:
So we’ll be explicit about the invariant and how each update preserves it.
We’ll use a closed interval [low, high] inclusive. Initialize:
low = 0high = n - 1Loop while the interval is non-empty:
low <= highCompute middle:
mid = low + (high - low) // 2(The low + (high - low) form avoids overflow in languages with fixed-size integers; in Python it’s not necessary, but it’s a good habit.)
Compare:
A[mid] == target: return midA[mid] < target: discard left half including mid → set low = mid + 1A[mid] > target: discard right half including mid → set high = mid - 1If the loop ends, return “not found” (often -1).
Let’s spell out what each case means.
Assume the array is sorted nondecreasing.
Because the array is sorted, for every i ≤ mid, we have A[i] ≤ A[mid] < target.
So no index ≤ mid can hold target.
Therefore, if target exists, it must be in indices mid + 1 ... high. Setting low = mid + 1 preserves:
If target exists, it’s still in [low, high].Similarly, for every i ≥ mid, we have A[i] ≥ A[mid] > target.
So no index ≥ mid can hold target.
Therefore, if target exists, it must be in indices low ... mid - 1. Setting high = mid - 1 preserves the invariant.
We’re done.
Each iteration strictly shrinks the interval:
low increases (to mid + 1) orhigh decreases (to mid - 1) orSince low and high are integers and the interval length can’t shrink forever without becoming empty, the loop terminates.
binary_search(A, target):
low = 0
high = len(A) - 1
while low <= high:
mid = low + (high - low) // 2
if A[mid] == target:
return mid
else if A[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1Suppose A = [2, 5, 7, 12, 19, 23, 31], target = 19.
Notice how [low, high] always contained index 4 until we found it.
Binary search is famous for off-by-one errors. Most bugs come from mixing interval conventions or mishandling duplicates.
The classic formula is:
mid = (low + high) // 2In many languages, low + high can overflow if indices are large. A safer version is:
mid = low + (high - low) // 2Both choose the “lower middle” when there are two middles.
There are two common interval conventions:
| Convention | Interval | Loop condition | Update when going right | Update when going left |
|---|---|---|---|---|
| Closed | [low, high] | low <= high | low = mid + 1 | high = mid - 1 |
| Half-open | [low, high) | low < high | low = mid + 1 or low = mid (depends) | high = mid |
For beginners, closed intervals are often easier to reason about for exact-match search.
The most important rule is: pick one convention and stay consistent.
If duplicates exist, the exact-match binary search above can return any index where A[mid] == target.
Sometimes you need a stable boundary, like:
These are incredibly useful in practice (counting occurrences, insertion positions, range queries).
A common pattern uses a half-open interval [low, high):
low = 0, high = nlow < high:mid = low + (high - low) // 2A[mid] < target, discard left including mid → low = mid + 1high = midAt the end, low is the smallest index with A[low] ≥ target (or n if none).
Why does high = mid (not mid - 1)? Because in half-open intervals, high is exclusive, and we want to keep mid as a candidate.
Same structure, just change the comparison:
A[mid] <= target, move low = mid + 1high = midThen the count of occurrences of target is:
Binary search should handle these without special-casing if your loop condition is correct.
high = -1 → low <= high is false immediately → return -1.If the array is sorted descending, the direction flips:
A[mid] < target, you must go left (because values decrease to the right).If the array is sorted by a key (e.g., objects sorted by age), compare using that key consistently.
A useful way to stay sane is to think: you need a monotonic predicate (something that switches from false to true once). Lower/upper bound are built exactly on that principle.
Binary search is the core primitive behind many “fast membership” tasks when data is stored sorted:
Time complexity is O(log n) per query, with O(1) extra space.
Using lower bound, you can find where to insert an element while maintaining sorted order.
This is common in:
If the array is sorted, you can count how many elements equal target or lie in a range [L, R].
ub(x) - lb(x)lb(R+ε) - lb(L) (conceptually), or more directly ub(R) - lb(L)A powerful pattern is binary searching over a value space when you have a monotonic condition.
Example shape:
k such that feasible(k) is true.feasible(k) is true, then feasible(k+1) is also true (monotonic).Then you can binary search k even if there is no array.
This is used for:
This lesson node focuses on array search, but learning lower/upper bound sets you up for that broader technique.
Array A is sorted: A = [1, 4, 6, 9, 13, 18, 21, 27]. Find target = 13 using binary search with [low, high].
Initialize:
low = 0
high = 7 (n-1)
Invariant: if 13 exists, it is in indices [0,7].
Iteration 1:
mid = low + (high - low)//2 = 0 + (7-0)//2 = 3
A[mid] = A[3] = 9
Compare: 9 < 13, so discard indices ≤ 3.
Update: low = mid + 1 = 4
Now interval is [4,7].
Iteration 2:
mid = 4 + (7-4)//2 = 4 + 3//2 = 5
A[5] = 18
Compare: 18 > 13, so discard indices ≥ 5.
Update: high = mid - 1 = 4
Now interval is [4,4].
Iteration 3:
mid = 4 + (4-4)//2 = 4
A[4] = 13
Compare: 13 == 13 → found.
Return index 4.
Insight: The invariant makes the updates feel inevitable: once A[mid] is too small, everything left of mid is too small because the array is sorted.
A = [1, 2, 2, 2, 3, 5, 5, 9]. Compute how many times target = 2 appears using lower_bound and upper_bound (half-open intervals).
Goal:
count(2) = upper_bound(2) - lower_bound(2)
where:
Compute lower_bound(2):
Initialize low=0, high=8 (n)
While low < high:
A[mid] < 2? No (3 < 2 is false) → high = mid = 4
A[mid] < 2? No (2 < 2 false) → high = 2
A[mid] < 2? No → high = 1
A[mid] < 2? Yes → low = mid + 1 = 1
Stop (low == high == 1). So lower_bound(2) = 1.
Compute upper_bound(2):
Initialize low=0, high=8
While low < high:
A[mid] <= 2? No → high=4
A[mid] <= 2? Yes → low = mid + 1 = 3
A[mid] <= 2? Yes → low = 4
Stop (low == high == 4). So upper_bound(2) = 4.
Compute count:
count = upper_bound - lower_bound = 4 - 1 = 3
Insight: Lower/upper bound treat equality differently. That tiny change ( < vs ≤ ) is what turns “find a match” into “find a boundary,” which is often more useful than returning any one index.
A = [3, 8, 10, 14, 17]. Search for target = 9 using the closed-interval exact-match binary search.
Initialize:
low=0, high=4
Iteration 1:
mid = 0 + (4-0)//2 = 2
A[2]=10
10 > 9 → high = mid - 1 = 1
Interval becomes [0,1]
Iteration 2:
mid = 0 + (1-0)//2 = 0
A[0]=3
3 < 9 → low = mid + 1 = 1
Interval becomes [1,1]
Iteration 3:
mid = 1 + (1-1)//2 = 1
A[1]=8
8 < 9 → low = mid + 1 = 2
Interval becomes [2,1] (empty because low > high)
Loop ends because low <= high is false.
Return -1 (not found).
Insight: The empty interval (low > high) is the precise signal that there is no index left where target could be, so “not found” is justified—not just a guess.
Binary search requires a sorted array (nondecreasing is the usual assumption).
Maintain a search interval: with a closed interval, the loop invariant is “if target exists, it’s in [low, high].”
Use mid = low + (high - low) // 2 to avoid overflow and to get correct integer rounding.
Closed interval exact-match pattern: while low <= high, then update with low = mid + 1 or high = mid - 1.
Binary search runs in O(log n) time because each comparison discards about half the remaining candidates.
With duplicates, exact-match may return any occurrence; use lower_bound / upper_bound to get first/last positions and counts.
Be consistent about interval conventions ([low, high] vs [low, high)) to avoid off-by-one errors.
Using the wrong loop condition for your interval convention (e.g., using low < high with updates meant for [low, high]).
Updating to low = mid or high = mid in a closed-interval exact-match search, which can cause infinite loops when low == mid or high == mid.
Computing mid = (low + high) // 2 in a language where low + high can overflow, leading to negative or incorrect mid.
For duplicates, expecting the exact-match search to return the first (or last) occurrence without implementing the boundary-search variant.
Implement the closed-interval binary search that returns the index of target in a sorted array A, or -1 if not found. Then trace it on A = [1, 2, 4, 8, 16] with target = 8.
Hint: Use low=0, high=n-1, loop while low <= high, and update low=mid+1 or high=mid-1.
Algorithm:
mid=low+(high-low)//2
if A[mid]==target return mid
if A[mid]<target low=mid+1 else high=mid-1
Trace:
A=[1,2,4,8,16], target=8
low=0 high=4 mid=2 A[2]=4<8 → low=3
low=3 high=4 mid=3 A[3]=8==8 → return 3
Given A = [1, 2, 2, 2, 3, 3, 10], find (a) lower_bound(3) and (b) upper_bound(3), and use them to compute how many 3s are in A.
Hint: Lower bound uses condition A[mid] < target; upper bound uses A[mid] <= target. Use a half-open interval [low, high) with high=n.
A length n=7.
Count of 3s = 6 - 4 = 2.
You are given a sorted array A (nondecreasing) and a target. Modify binary search to return the index of the FIRST occurrence of target (or -1 if absent). Describe the rule for updating boundaries when you find equality.
Hint: When A[mid] == target, don’t stop immediately; keep searching left while preserving that the first occurrence is still in the interval.
One approach (half-open lower_bound style):
Compute i = lower_bound(A, target) (first index with A[i] ≥ target). If i < n and A[i] == target, return i else return -1.
If you implement directly with a closed interval:
ans = -1.Return ans at the end.
Key rule: on equality, shrink toward the side where the first occurrence could be (left), rather than terminating.