BFS and DFS. Exploring all reachable nodes systematically.
Self-serve tutorial - low prerequisites, straightforward concepts.
Graphs show up whenever “things connect to other things”: friends in a social network, pages linked on the web, roads between cities, function calls in code. Graph traversal is the basic skill that lets you explore what’s reachable from a starting point—systematically, without getting stuck in cycles, and with predictable order.
Graph traversal means visiting every vertex reachable from a start vertex using a frontier (data structure of “discovered but not fully processed” vertices) plus a visited[v] marker to avoid revisits. BFS uses a FIFO queue (good for unweighted shortest paths in edges); DFS uses a LIFO stack/recursion (good for exploring deep structure and enabling algorithms like topological sort and SCCs).
Before you learn BFS/DFS, make sure these are solid:
1) Graph representations (you already know this)
2) Queue/Stack basics (quick refresh)
3) What “unweighted shortest path” means
4) Edge cases you must handle (important for correctness)
One key theme across everything you’re about to do: traversal is not “visit each vertex once by magic.” It’s a careful combination of:
A graph traversal is a procedure that explores all vertices reachable from a given start vertex s, in a systematic order.
In a tree, you can “just” walk children pointers and never worry about coming back around. In a general graph, you can have:
Without a rule to avoid revisiting, you can loop forever.
Given a graph G = (V, E) and start vertex s, traversal explores:
If there is no path from s to v, traversal from s will never visit v.
Traversal maintains a frontier: vertices that have been discovered but not fully processed.
This single design choice drastically changes the order of exploration and the properties you can prove.
We maintain a boolean marker:
This prevents:
A subtle but crucial detail is when you set visited[v]. The standard and safest approach is:
If you delay marking until extraction, then parallel edges (or converging paths) can insert the same vertex many times, bloating runtime.
Here’s the shared structure both BFS and DFS follow:
1) Initialize all visited[v] = false
2) Create an empty frontier
3) Add s to frontier, set visited[s] = true
4) While frontier not empty:
The only difference is the frontier policy (queue vs stack).
Suppose you’re in a social network graph and want to find people closest to you:
You want to explore in “layers” of increasing hop count. That is exactly what BFS does.
BFS uses a queue, so vertices discovered earlier are processed earlier.
That enforces a level-by-level expansion from the start vertex s.
We often track a distance array (optional but common):
Initialize:
Assume the graph is stored as adjacency lists Adj[u].
BFS(G, s):
for each vertex v in V:
visited[v] = false
dist[v] = +∞
parent[v] = NIL
visited[s] = true
dist[s] = 0
Q = empty queue
enqueue(Q, s)
while Q not empty:
u = dequeue(Q)
for each v in Adj[u]:
if visited[v] == false:
visited[v] = true
dist[v] = dist[u] + 1
parent[v] = u
enqueue(Q, v)Claim (unweighted shortest paths): When BFS first discovers a vertex v, dist[v] is the length (in edges) of the shortest path from s to v.
Intuition:
A useful view is “BFS tree”:
Let n = |V| and m = |E|.
Sometimes you don’t care about shortest hop count. You care about structure:
DFS explores “as deep as possible” before backtracking.
1) Recursive DFS (uses the call stack implicitly)
2) Iterative DFS (uses an explicit stack)
They behave similarly, but iterative DFS avoids recursion depth limits.
DFS(G, s):
for each vertex v in V:
visited[v] = false
parent[v] = NIL
dfsVisit(s)
dfsVisit(u):
visited[u] = true
for each v in Adj[u]:
if visited[v] == false:
parent[v] = u
dfsVisit(v)To match the “mark when discovered” rule:
DFS_iter(G, s):
for each vertex v in V:
visited[v] = false
parent[v] = NIL
S = empty stack
visited[s] = true
push(S, s)
while S not empty:
u = pop(S)
for each v in Adj[u]:
if visited[v] == false:
visited[v] = true
parent[v] = u
push(S, v)Note: This iterative version’s exact visitation order depends on neighbor ordering in Adj[u]. That’s normal.
DFS guarantees:
DFS does not guarantee shortest paths.
Example intuition:
Same as BFS:
Identical handling as BFS with visited marking:
Some DFS-based algorithms (topological sort, SCC) track timestamps:
You don’t need timestamps to traverse, but it’s helpful to know DFS has a natural notion of “finishing” a node after exploring its descendants.
They solve different kinds of problems. The frontier policy is the whole story.
| Goal | Prefer | Why |
|---|---|---|
| Explore in increasing number of edges from s | BFS | FIFO creates distance layers |
| Find shortest paths in an unweighted graph | BFS | First discovery gives shortest dist[v] |
| Detect cycles / reason about backtracking structure | DFS | Natural recursion stack / finish times |
| Topological sort (DAG ordering) | DFS | Uses finish times |
| Strongly connected components | DFS | Key building block (Kosaraju/Tarjan) |
| Just check reachability s → t | Either | Both are O(n+m); DFS may find a path quickly |
Many bigger algorithms start with “run BFS/DFS from …”
So far we assumed a start node s. If you want to visit every vertex in the graph:
for each vertex v in V:
if visited[v] == false:
BFS(G, v) // or DFS(G, v)This produces a forest of BFS/DFS trees—one per connected component (undirected) or per reachable region (directed).
Because adjacency lists can contain repeated neighbors (parallel edges), you should always structure the neighbor scan as:
This keeps runtime tight: each vertex enters the frontier once.
Graph (undirected) with vertices {A,B,C,D,E,F}. Edges: A-B, A-C, B-D, C-D, C-E, D-F, E-F. Start s = A. Use BFS to compute dist[·] and parent[·], then reconstruct a shortest path from A to F.
Initialize:
visited[]=false, dist[]=∞, parent[*]=NIL
visited[A]=true, dist[A]=0
Q=[A]
Dequeue A. Neighbors: B, C.
B unvisited ⇒ visited[B]=true, dist[B]=1, parent[B]=A, enqueue B
C unvisited ⇒ visited[C]=true, dist[C]=1, parent[C]=A, enqueue C
Now Q=[B,C]
Dequeue B. Neighbors: A, D.
A already visited ⇒ ignore
D unvisited ⇒ visited[D]=true, dist[D]=2, parent[D]=B, enqueue D
Now Q=[C,D]
Dequeue C. Neighbors: A, D, E.
A visited ⇒ ignore
D is already visited (discovered via B) ⇒ ignore (important!)
E unvisited ⇒ visited[E]=true, dist[E]=2, parent[E]=C, enqueue E
Now Q=[D,E]
Dequeue D. Neighbors: B, C, F.
B visited ⇒ ignore
C visited ⇒ ignore
F unvisited ⇒ visited[F]=true, dist[F]=3, parent[F]=D, enqueue F
Now Q=[E,F]
Dequeue E. Neighbors: C, F.
C visited ⇒ ignore
F already visited ⇒ ignore
Q=[F]
Dequeue F. Neighbors: D, E (both visited). Done. Q=[]
Distances found:
dist[A]=0
dist[B]=1, dist[C]=1
dist[D]=2, dist[E]=2
dist[F]=3
Reconstruct shortest path A→F using parent pointers:
F has parent D
D has parent B
B has parent A
So path (reverse) is F←D←B←A ⇒ A→B→D→F
This path has 3 edges, matching dist[F]=3.
Insight: The moment F is first discovered, BFS has already guaranteed it used the fewest edges possible. Notice how the edge C-D did not change dist[D] after D was discovered; visited marking at discovery time prevents duplicates and preserves the first (shortest) discovery.
Directed graph with vertices {S, A, B, C, T}. Edges: S→A, S→T, A→B, B→C, C→T. Start at S. Run DFS (recursive) scanning neighbors in alphabetical order. Compare the path found to the shortest path in edges.
Adjacency lists (alphabetical):
Adj[S]=[A,T]
Adj[A]=[B]
Adj[B]=[C]
Adj[C]=[T]
Adj[T]=[]
Start dfsVisit(S):
visited[S]=true
From S, first neighbor is A (unvisited):
parent[A]=S
call dfsVisit(A)
At A:
visited[A]=true
neighbor B unvisited ⇒ parent[B]=A, dfsVisit(B)
At B:
visited[B]=true
neighbor C unvisited ⇒ parent[C]=B, dfsVisit(C)
At C:
visited[C]=true
neighbor T unvisited ⇒ parent[T]=C, dfsVisit(T)
At T:
visited[T]=true
Adj[T] empty ⇒ return
Unwind returns back to C, B, A, S.
Back at S, next neighbor is T, but visited[T]=true so it is skipped.
Insight: DFS found a path S→A→B→C→T (4 edges), even though there is a direct edge S→T (1 edge). DFS is about deep exploration, not shortest paths. Neighbor order can strongly influence what DFS finds first.
Directed graph with vertices {1,2,3}. Edges: 1→1 (self-loop), 1→2 (two parallel edges), 2→3. Start BFS at 1. Show the queue evolution and explain why it terminates cleanly.
Adj[1]=[1,2,2]
Adj[2]=[3]
Adj[3]=[]
Initialize visited[*]=false
visited[1]=true, Q=[1]
Dequeue 1. Scan neighbors:
Now Q=[2]
Dequeue 2. Neighbor v=3 unvisited ⇒ visited[3]=true, enqueue 3
Q=[3]
Dequeue 3. No neighbors. Q=[] stop.
Insight: Marking visited at discovery time ensures each vertex enters the frontier at most once, even if adjacency lists contain duplicates (parallel edges) or cycles (including self-loops).
Traversal explores all vertices reachable from a start vertex s; unreachable vertices require separate starts.
The frontier is the set of discovered-but-not-processed vertices; its extraction policy defines BFS vs DFS.
BFS uses a FIFO queue and explores in distance layers; it yields unweighted shortest-path distances in number of edges.
DFS uses a LIFO stack/recursion and explores deep before backtracking; it’s foundational for structural algorithms (topological sort, SCC).
Use a boolean marker visited[v] and (usually) set it when you discover v to avoid duplicates and guarantee termination.
With adjacency lists, both BFS and DFS run in O(|V| + |E|); with adjacency matrices, neighbor scanning often costs O(|V|²).
Self-loops and parallel edges do not break traversal if visited marking is done correctly—they just add redundant neighbor checks.
Marking visited[v] only when you pop/dequeue v, which can cause the same vertex to be added many times (especially with parallel edges).
Expecting DFS to produce shortest paths; only BFS guarantees shortest paths in unweighted graphs.
Forgetting that traversal from a single start vertex won’t visit disconnected parts of the graph (or unreachable regions in directed graphs).
Using an adjacency matrix but still assuming O(|V|+|E|) runtime; with a matrix you typically scan O(|V|) neighbors per vertex.
Run BFS on the undirected graph with vertices {0,1,2,3,4} and edges: 0-1, 0-2, 1-3, 2-3, 3-4. Start at 0. Compute dist[·] and one shortest path from 0 to 4 using parent pointers.
Hint: Initialize dist[0]=0. When you discover a neighbor v from u, set dist[v]=dist[u]+1 and parent[v]=u.
BFS layers:
Start: dist[0]=0
Discover 1 and 2 ⇒ dist[1]=1 parent[1]=0; dist[2]=1 parent[2]=0
Process 1 ⇒ discover 3 ⇒ dist[3]=2 parent[3]=1
Process 2 ⇒ 3 already visited
Process 3 ⇒ discover 4 ⇒ dist[4]=3 parent[4]=3
Shortest path 0→4: follow parents 4←3←1←0 ⇒ 0→1→3→4 (length 3).
Give a small directed graph where DFS from S visits T, but the DFS parent path from S to T is longer (more edges) than the shortest path. Explain briefly why BFS would avoid this.
Hint: Include both a direct edge S→T and a longer chain S→A→B→…→T, and order neighbors so DFS explores the chain first.
Example: vertices {S,A,B,T}. Edges: S→A, A→B, B→T, and S→T. If Adj[S]=[A,T], DFS explores S→A→B→T (3 edges) and marks T visited before returning to consider S→T. BFS would discover T from S immediately with dist[T]=1, which is the unweighted shortest path.
You want to traverse an entire graph that may be disconnected. Describe (in pseudocode-level English) how to use BFS or DFS to ensure every vertex is visited exactly once, even with self-loops and parallel edges.
Hint: Use an outer loop over all vertices; start a new traversal whenever you find an unvisited vertex; mark visited on discovery.
Set visited[v]=false for all v. For each vertex v in V: if visited[v]==false, start BFS/DFS from v. In the traversal, when scanning neighbors u of a popped vertex, only enqueue/push u if visited[u]==false, and immediately set visited[u]=true. This prevents repeats from cycles, self-loops, or parallel edges and guarantees each vertex is added to the frontier at most once.