Dijkstra, Bellman-Ford algorithms. Minimum cost paths.
Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.
When you ask “what’s the cheapest way to get from s to every other node?”, you’re really asking for a proof-backed algorithm that turns local edge information into globally optimal paths. Shortest paths is where greedy choices sometimes work (Dijkstra) and sometimes fail (negative edges), forcing a different strategy (Bellman–Ford).
Shortest-path distance from a source s to v is the minimum total weight among all s→v paths. The core operation is edge relaxation: try to improve d[v] using d[u] + w(u,v). With nonnegative weights, Dijkstra uses a priority queue and a greedy “finalize the smallest estimate” rule. With negative edges (but no reachable negative cycles), Bellman–Ford repeatedly relaxes all edges until convergence (or detects a negative cycle).
You’re given a weighted graph and a starting vertex (the source) s. Each edge (u,v) has a cost/weight w(u,v). You want, for every vertex v, the shortest-path distance δ(s,v): the minimum total cost of any path from s to v.
A path is a sequence of edges:
s = v₀ → v₁ → … → vₖ = v
Its total cost is the sum of its edge weights:
Then the shortest-path distance is:
If v is unreachable from s, we define δ(s,v) = ∞.
BFS works for unweighted graphs (or all edges cost 1), because “fewest edges” equals “minimum cost.” With weights, a path with more edges can be cheaper.
Example: s→a (cost 100), s→b (1), b→a (1). BFS might pick the 1-edge path s→a, but the cheapest path is s→b→a with cost 2.
So we need algorithms that reason about costs.
Most shortest-path algorithms maintain:
Distances answer “how far,” predecessors answer “how to get there.”
To avoid hidden confusion, here are the working assumptions for the lesson:
These assumptions are the difference between an algorithm that is correct vs. one that silently gives wrong answers.
Shortest paths rests on a basic necessary condition: for any edge (u,v), the true shortest distances must satisfy
Because you can always go to u optimally, then take (u,v). Algorithms exploit this by repeatedly enforcing that inequality via relaxation.
Shortest paths feels global (“consider all paths”), but the algorithms work by applying a local rule repeatedly until it forces the global optimum.
That local rule is edge relaxation.
Suppose we currently believe d[u] is a good estimate of δ(s,u). Then taking edge (u,v) suggests a candidate distance to v:
Relaxation says: if this candidate improves d[v], update it.
For a directed edge (u,v):
In pseudocode:
RELAX(u, v):
if d[u] + w(u,v) < d[v]:
d[v] = d[u] + w(u,v)
π[v] = uRelaxation is just enforcing the inequality:
If your current d violates it, you fix d[v].
If eventually you reach a point where no edge can relax, then you have a stable set of estimates. Under the right conditions (depends on algorithm + weights), that stable point equals the true shortest distances.
Almost all shortest-path methods begin with:
This encodes: “we know how to reach s at cost 0; everything else unknown.”
Take s→a (5), s→b (2), b→a (1). Initialize:
Relax edges out of s:
Now relax (b,a): candidate = d[b]+1 = 2+1=3 < d[a]=5 ⇒ update d[a]=3, π[a]=b
We discovered a cheaper 2-edge route beating a more direct expensive edge.
Relaxation is always the operation.
So when learning shortest paths, keep asking:
1) What relaxations are performed?
2) In what order?
3) When can we stop?
4) When are we allowed to “finalize” a distance?
If all edge weights are nonnegative, paths only get more expensive as you extend them. That single fact enables a powerful claim:
When the vertex with smallest current estimate d[·] is selected next, that estimate is already optimal.
This is the extract-min finalization principle.
Intuition: if v currently has the smallest tentative distance, any alternative path to v must go through some not-yet-finalized vertex x with d[x] ≥ d[v]. Adding a nonnegative edge from x onward can’t make a cheaper route to v.
Maintain a min-priority queue keyed by d[v]. Often we keep a set S of “finalized” vertices.
Pseudocode sketch:
DIJKSTRA(G, s):
for v in V:
d[v] = ∞
π[v] = NIL
d[s] = 0
Q = min-priority-queue over V keyed by d[·]
while Q not empty:
u = extract-min(Q)
for each edge (u,v) in Adj[u]:
if d[u] + w(u,v) < d[v]:
d[v] = d[u] + w(u,v)
π[v] = u
decrease-key(Q, v)Let S be the set of extracted vertices so far. Dijkstra maintains the invariant:
When we extract-min u, we claim no cheaper path to u exists via vertices not in S. Suppose for contradiction there is a cheaper path P to u that goes through some vertex x ∉ S. Consider the first vertex y on P that is not in S; its predecessor z is in S. When z was finalized, relaxation would set d[y] ≤ d[z]+w(z,y). Because weights are nonnegative, the remainder of the path from y to u can’t reduce the cost below d[y]. This would imply u shouldn’t have been the minimum. Contradiction.
The nonnegativity is used precisely when we say “continuing along the path cannot make it cheaper.” With negative edges, it absolutely can.
Let |V| = n, |E| = m.
Given the prerequisite includes heaps, the binary heap bound is the one to internalize.
To reconstruct path to t:
If there’s a reachable negative edge, the extract-min “finalization” can lock in a distance too early.
Tiny counterexample:
Dijkstra extracts a first (d[a]=2) and finalizes it. Later, when b is processed, it discovers a cheaper route to a of cost -5, but a is already finalized, so the algorithm (in its standard form) will not correct it. The greedy proof breaks because weights aren’t nonnegative.
So: Dijkstra is fast, but only under the correct weight assumptions.
Negative edges break greedy finalization, but shortest paths can still be well-defined as long as there is no negative-weight cycle reachable from s.
A negative cycle means you can loop to reduce path cost without bound:
So a correct algorithm must either:
Bellman–Ford does exactly that.
If there are no negative cycles reachable from s, then an optimal shortest path to any vertex can be chosen to be simple (no repeated vertices). A simple path in a graph with n vertices has at most n−1 edges.
This gives a natural plan:
Initialize d[s]=0, others ∞.
Repeat n−1 times:
Then do one more pass to detect negative cycles:
Pseudocode:
BELLMAN-FORD(G, s):
for v in V:
d[v] = ∞
π[v] = NIL
d[s] = 0
for i = 1 to |V|-1:
for each edge (u,v) in E:
RELAX(u,v)
for each edge (u,v) in E:
if d[u] + w(u,v) < d[v]:
return "negative cycle reachable"
return d, πDefine δᵢ(s,v) = the shortest path from s to v using at most i edges.
Claim: after i full passes of relaxing all edges, d[v] = δᵢ(s,v).
After n−1 passes, δₙ₋₁(s,v)=δ(s,v) when no negative cycles are reachable.
Bellman–Ford is slower:
Total time: O(nm)
But it buys you two capabilities Dijkstra doesn’t have:
In many graphs, distances converge before n−1 passes. You can track whether any relaxation happened in a pass; if none happened, stop early.
This doesn’t change worst-case big-O, but helps in practice.
If v is unreachable, d[v] stays ∞ forever.
In the negative-cycle detection pass, edges out of unreachable vertices do not matter because their d[u]=∞, and ∞ + w won’t relax anything.
Bellman–Ford is the “safe general tool” for signed weights (no negative cycles). Dijkstra is the fast specialized tool for nonnegative weights.
The skill is choosing based on weight signs and required guarantees.
Picking the right shortest-path method is mostly about graph structure and edge weights.
| Setting | Recommended tool | Why | ||||
|---|---|---|---|---|---|---|
| Unweighted (or all weights equal) | BFS | Layer-by-layer equals minimal cost | ||||
| Weights are 0 or 1 | 0–1 BFS | Deque gives O( | V | + | E | ) |
| All weights ≥ 0 | Dijkstra + heap | Greedy finalization is correct and fast | ||||
| Some weights negative, no negative cycle reachable | Bellman–Ford | Repeated relaxation converges | ||||
| DAG (can have negative weights) | Topological DP + relax | No cycles, linear-time |
Even if you only implement Dijkstra and Bellman–Ford now, keeping the table in mind prevents misapplication.
Distances are useful, but applications often need the actual route.
Given π[v], reconstruct s→t:
If you hit NIL before reaching s, then t was unreachable.
Shortest paths shows up inside flow algorithms in two major ways:
Basic max-flow (like Ford–Fulkerson) repeatedly finds an s→t path in a residual graph. If you choose augmenting paths carefully, you can improve runtime.
A common extension of flow problems is minimizing total cost while sending as much flow as possible. Those algorithms repeatedly compute shortest paths in a residual network where edge costs can be negative due to reverse edges.
That’s where understanding weight-sign applicability becomes essential:
So shortest paths is not just a graph topic; it’s a core subroutine that powers more complex optimization.
This mental model transfers cleanly into flow, dynamic programming on graphs, and many ML/OR optimization formulations.
Directed graph with source s and edges:
All weights are ≥ 0, so Dijkstra applies.
Goal: compute shortest distances from s to all vertices and one shortest path to d.
Initialize:
Priority queue contains all vertices keyed by d.
Extract-min: u = s (d=0). Relax outgoing edges:
Now tentative distances: d[b]=1, d[a]=4, others ∞.
Extract-min: u = b (d=1). Relax outgoing edges:
Now d[a]=3, d[c]=6, d[d]=∞.
Extract-min: u = a (d=3). Relax outgoing edges:
Now d[c]=4, d[d]=10.
Extract-min: u = c (d=4). Relax outgoing edges:
Now d[d]=7.
Extract-min: u = d (d=7). No outgoing edges to improve anything. Done.
Final distances:
Reconstruct path to d using π:
So path is s→b→a→c→d with total cost 1+2+1+3=7.
Insight: Notice how the best route to a was not the direct edge (cost 4) but via b (cost 3). Dijkstra discovers this through relaxation, and nonnegative weights ensure once a vertex is extracted, its distance never needs revision.
Directed graph with source s and edges:
There is a negative edge b→c, so Dijkstra is not safe. Use Bellman–Ford to compute shortest distances and show convergence.
Initialize:
We will do |V|-1 = 4 passes (vertices: s,a,b,c,d ⇒ |V|=5).
Pass 1 (relax all edges in any order; we’ll list them):
1) Relax(s,a): d[a] = min(∞, 0+1)=1, π[a]=s
2) Relax(s,b): d[b] = min(∞, 0+10)=10, π[b]=s
3) Relax(a,c): d[c] = min(∞, 1+2)=3, π[c]=a
4) Relax(b,c): candidate = 10 + (-10) = 0 < d[c]=3 ⇒ d[c]=0, π[c]=b
5) Relax(c,d): d[d] = min(∞, 0+3)=3, π[d]=c
End of pass 1: d[a]=1, d[b]=10, d[c]=0, d[d]=3.
Pass 2:
1) Relax(s,a): candidate 0+1=1 (no change)
2) Relax(s,b): candidate 10 (no change)
3) Relax(a,c): candidate 1+2=3 (no change; d[c]=0 is better)
4) Relax(b,c): candidate 10-10=0 (no change)
5) Relax(c,d): candidate 0+3=3 (no change)
No changes occurred in this pass ⇒ we can stop early (optional optimization).
Negative-cycle check (one extra scan):
For each edge (u,v), verify d[u] + w(u,v) < d[v] is false.
Final distances:
Reconstruct d: π[d]=c, π[c]=b, π[b]=s ⇒ s→b→c→d.
Insight: Bellman–Ford succeeds because it does not finalize vertices early. The negative edge b→c can improve c even after c was already assigned a tentative value (3 via a). Repeated full-edge relaxations ensure these late improvements propagate.
Directed graph with source s and edges:
This creates a cycle a→b→a with total weight 1 + (-3) = -2, reachable from s.
Run the Bellman–Ford logic conceptually to see detection.
Initialize d[s]=0, others ∞.
After relaxing enough times, d[a] and d[b] keep decreasing:
Then again:
This can repeat indefinitely, lowering costs without bound.
Bellman–Ford after |V|-1 passes performs one more edge scan.
Because the cycle is negative, at least one edge on the cycle will still be able to relax:
For example, if currently d[b]=0, then b→a gives candidate -3 < current d[a], so a relaxation is still possible.
Algorithm reports a reachable negative cycle, meaning shortest-path distances are not well-defined (they are effectively −∞ for vertices reachable after looping).
Insight: The extra pass is not a technicality: it’s the mathematical line between “there exists a minimum” and “you can always do better by looping.”
The shortest-path distance δ(s,v) is the minimum total weight over all s→v paths; unreachable vertices have distance ∞.
Relaxation is the core operation: if d[u] + w(u,v) improves d[v], update d[v] and set π[v]=u.
Dijkstra’s algorithm is correct only when all reachable edge weights are nonnegative; then extract-min finalizes a vertex’s distance.
With a binary heap and adjacency lists, Dijkstra runs in O((|V|+|E|) log |V|).
Bellman–Ford handles negative edges by relaxing all edges |V|−1 times, giving O(|V||E|) time.
If any edge can still relax after |V|−1 passes, a negative cycle reachable from s exists and shortest paths are undefined (−∞).
Predecessor pointers π[·] let you reconstruct actual shortest paths, not just distances.
Special cases matter: DAG shortest paths (topological order) and 0–1 BFS can outperform the general algorithms in their niches.
Running Dijkstra on graphs with negative edges (even a single reachable negative edge can invalidate the greedy finalization).
Forgetting that undirected edges must be treated as two directed edges; mixing representations can lead to missing relaxations.
Not using decrease-key (or an equivalent approach) with Dijkstra’s priority queue, leading to incorrect or slower implementations.
Misinterpreting Bellman–Ford’s negative-cycle result: it only guarantees detection of cycles reachable from s (unreachable negative cycles don’t affect δ(s,·)).
You have a directed graph with weights all ≥ 0 and you run Dijkstra from s. After extracting some set S of vertices, a vertex u is extracted next with smallest key d[u]. Explain (in 2–4 sentences) why d[u] cannot later be improved by a path that goes through any vertex outside S.
Hint: Use the idea: any alternative path to u must cross from S to V\S somewhere; weights are nonnegative so costs can’t drop after that crossing.
Consider any path P from s to u that uses a vertex outside S. Let (x,y) be the first edge on P with x ∈ S and y ∉ S. When x was extracted, d[x]=δ(s,x), and relaxing (x,y) would ensure d[y] ≤ d[x]+w(x,y). Since all weights are ≥ 0, the remaining suffix from y to u cannot make the total cost smaller than d[y], so u could not have a smaller true distance than the current minimum key d[u]. Thus d[u] is final when extracted.
Run Bellman–Ford for 3 passes on this graph (source s). Vertices: s,a,b,c. Edges: s→a (4), s→b (5), a→b (-2), b→c (3), a→c (10). Compute d[a], d[b], d[c] after each pass.
Hint: Initialize d[s]=0, others ∞. In each pass, relax edges in the listed order. Track updates carefully, especially from a→b (-2).
Initialization: d[a]=∞, d[b]=∞, d[c]=∞.
Pass 1:
After pass 1: d[a]=4, d[b]=2, d[c]=5.
Pass 2:
After pass 2: d[a]=4, d[b]=2, d[c]=5.
Pass 3: identical, no changes.
So distances converge after pass 1.
Decide which algorithm to use (and why) for each case:
1) weights are {0,1}
2) DAG with some negative edges
3) general graph with a negative edge but you also need to detect negative cycles
4) road network with all distances positive
Hint: Match each case to the selection table: 0–1 BFS, DAG shortest paths, Bellman–Ford, Dijkstra.
1) 0–1 BFS, because weights are only 0/1 and you can use a deque for O(|V|+|E|).
2) DAG shortest paths via topological order + relaxation, because acyclicity guarantees correctness even with negative edges.
3) Bellman–Ford, because it supports negative edges and includes a negative-cycle detection pass.
4) Dijkstra, because all weights are nonnegative and it’s typically faster (O((|V|+|E|) log |V|) with a heap).
Next nodes you can tackle:
Related/background nodes to review if needed:
Nearby special-case extensions (good follow-ups):