Linear ordering of DAG vertices. Dependencies come before dependents.
Self-serve tutorial - low prerequisites, straightforward concepts.
When a project has dependencies—"compile before link", "foundation before walls", "parents before children"—you’re really asking for a linear order that respects directed constraints. Topological sort is the standard way to produce that order, as long as your dependency graph has no cycles.
A topological sort of a directed graph is a linear ordering of vertices such that for every edge u -> v, u appears before v. It exists iff the graph is a DAG. Two classic algorithms are: (1) Kahn’s algorithm (repeatedly remove indegree-0 nodes) and (2) DFS postorder (reverse finishing times). If you can’t find an indegree-0 node (or DFS finds a back edge), you’ve detected a cycle—meaning no valid dependency order exists.
Many real problems are phrased as dependencies:
A dependency is naturally modeled as a directed edge.
If you want to execute everything, you need a single sequence of all tasks that respects every edge.
That is exactly what topological sorting provides.
Let G = (V, E) be a directed graph.
A topological ordering of G is a linear ordering of all vertices:
v₁, v₂, …, vₙ
such that for every directed edge u -> v in E, u appears before v in the ordering.
A topological sort is an algorithm that returns such an ordering.
Key intuition: edges point “forward in time.” Topological order is a timeline that never violates an arrow.
A topological ordering exists if and only if the graph is a DAG (Directed Acyclic Graph).
then A must come before B, B before C, and C before A. That last constraint contradicts the first two. No linear order can satisfy all.
This “starting node exists” fact is not just intuitive—it’s the engine behind Kahn’s algorithm.
Topological orderings are often not unique.
Example:
Both A and B must precede C, but A and B can be swapped:
This reflects reality: when tasks are independent, you have scheduling freedom.
Topological sort ensures validity (dependency constraints), not:
Think of it as producing a feasible plan, not necessarily the optimal one.
If the graph is a DAG, there must be at least one vertex with indegree 0 (no incoming edges). Call these vertices sources.
Why must a source exist?
So in a DAG, at least one source exists. If you output a source first, you can safely remove it and continue. This yields a valid ordering.
Maintain:
Repeatedly:
At the end:
Consider tasks:
Compute indegrees:
Initialize queue with A,B,C.
One possible run:
Result: A, B, C, D, E, F
Another valid result might start with C, then A, etc. Kahn’s algorithm naturally allows multiple answers depending on how you choose among indegree-0 nodes.
Let n = |V|, m = |E|.
Total: O(n + m) time and O(n + m) space (for adjacency list, indegrees, queue).
If there is a directed cycle, no vertex in that cycle can ever reach indegree 0 (within the subgraph induced by remaining vertices). So the queue empties early.
A reliable check:
This is a practical benefit: Kahn’s algorithm is both a sorter and a cycle detector.
The container for indegree-0 vertices affects which valid ordering you get.
| Structure | What you get | Typical use |
|---|---|---|
| Queue (FIFO) | “Earlier discovered first” | simple scheduling |
| Stack (LIFO) | tends to go deep | sometimes convenient |
| Priority queue | smallest label first | deterministic / canonical-ish order |
All produce valid topological orders as long as you always remove a vertex with indegree 0.
You already know DFS as a traversal. For topological sort, the key observation is about finishing time.
During DFS, a node u “finishes” after exploring all outgoing edges u -> v and fully finishing those descendants v.
In a DAG, if there is an edge u -> v, then v must finish before u finishes (because DFS from u will (directly or indirectly) reach v, or v is explored in another DFS tree but still cannot create a back edge in a DAG).
So if we list nodes in decreasing finishing time, u will appear before v whenever u -> v.
That ordering is exactly a topological ordering.
order).order at the end.Equivalently: push u onto a stack at finish-time; popping the stack yields topo order.
DFS-based topo sort is typically paired with cycle detection using a 3-state visitation scheme:
When exploring an edge u -> v:
This is conceptually clean: a back edge means “u depends on something currently depending on u,” which is exactly the cycle contradiction.
Edges:
One DFS run:
Finish sequence (append at finish): [2, 4, 3, 1]
Reverse: [1, 3, 4, 2]
Check constraints:
Same as Kahn’s algorithm:
| Aspect | Kahn’s algorithm | DFS reverse postorder |
|---|---|---|
| Main concept | indegree-0 “available tasks” | finishing times / recursion |
| Cycle detection | output_count < n | detect back edges (GRAY→GRAY) |
| Produces order incrementally | yes (streaming) | order known after DFS completion |
| Good for scheduling/queues | excellent | okay |
| Implementation pitfalls | careful indegree updates | recursion depth / color logic |
If you’re thinking in terms of “what can I do next?”, Kahn’s algorithm matches that mental model.
If you’re already in DFS mode, reverse postorder is elegant and short.
Nodes are artifacts; edges encode prerequisites.
A topological order is a valid build order.
If a cycle exists (A depends on B depends on A), your build system should fail with a helpful cycle report.
Nodes are courses; edges are prerequisites.
Topological sorting provides a feasible sequence of courses.
Often you also have constraints like “at most 4 courses per term.” Topological sort is still the backbone, but then you add batching/leveling (a scheduling layer).
Data and model workflows are DAGs:
Orchestrators (Airflow, Dagster, Prefect) effectively maintain the set of indegree-0 tasks that are ready to run—Kahn’s idea with extra machinery (retries, resources, time).
A powerful pattern:
Why it works: when you process v, all prerequisites u with u -> v have already been processed.
Example: longest path in a DAG.
Let dp[v] = length of longest path ending at v.
Then
Processing in topological order ensures dp[u] is ready when computing dp[v].
In causal inference, DAGs represent directional assumptions (potential causal influence).
Topological ordering is not the full causal story, but it reinforces the key constraint: if u is a cause of v (directly or indirectly), u cannot come after v in a causal ordering.
A DAG defines a partial order: u ≤ v if there is a directed path u → … → v.
A topological ordering is a linear extension of that partial order.
This viewpoint helps explain non-uniqueness:
Given an ordering pos[·], you can verify it in O(n + m):
This is useful in testing: generate a topo order, then assert it respects all edges.
Topological sort is a “glue” primitive:
Once you see dependencies as edges, topo sort becomes the default tool.
Vertices: {A, B, C, D, E}
Edges:
A -> D
B -> D
B -> E
C -> E
D -> E
Goal: produce one topological ordering using Kahn’s algorithm.
Compute indegrees:
Initialize the set/queue of indegree-0 vertices:
Q = [A, B, C]
Output = []
Pop A (one of the available sources):
Output = [A]
Remove edges out of A:
Q now contains [B, C] (D not added because indegree[D] ≠ 0).
Pop B:
Output = [A, B]
Remove edges out of B:
Q becomes [C, D].
Pop C:
Output = [A, B, C]
Remove edges out of C:
Q becomes [D].
Pop D:
Output = [A, B, C, D]
Remove edges out of D:
Q becomes [E].
Pop E:
Output = [A, B, C, D, E]
Done: output_count = 5 = |V|, so this is a valid topological ordering.
Insight: The only rule is: always choose a vertex with indegree 0. The ordering is not unique; different choices among {A, B, C} at the start produce different (still valid) schedules.
Vertices: {1, 2, 3, 4}
Edges:
1 -> 2
2 -> 3
3 -> 1
3 -> 4
Try to topologically sort and observe what fails.
Compute indegrees:
Initialize indegree-0 set:
No vertex has indegree 0.
So Q = [] immediately.
Kahn’s algorithm stops with Output = [].
Since output_count = 0 < 4, we conclude: a directed cycle exists, so no topological ordering exists.
Interpretation:
The cycle 1 -> 2 -> 3 -> 1 requires 1 before 2 before 3 before 1, which is impossible in a linear order.
Edge 3 -> 4 additionally forces 3 before 4, but the cycle already makes the constraints inconsistent.
Insight: Cycle detection is not an extra feature bolted on—failure to find a new source is exactly the symptom of cyclic dependencies.
Run DFS topo sort on:
Edges:
A -> B
A -> C
C -> D
D -> B
We will compute finishing order and then reverse it.
Initialize all vertices WHITE.
order = []
Start DFS at A:
Mark A GRAY.
Explore A -> B:
order = [B]
Back to A, explore A -> C:
order = [B, D]
order = [B, D, C]
Finish A:
Mark A BLACK, append A
order = [B, D, C, A]
Reverse finishing list:
Topo order = [A, C, D, B]
Check edges:
A -> B (A before B) ✓
A -> C (A before C) ✓
C -> D (C before D) ✓
D -> B (D before B) ✓
Insight: In a DAG, an edge u -> v implies v finishes before u (in the sense needed for reverse postorder). If DFS ever sees an edge to a GRAY vertex, that’s a back edge and proves a cycle.
Topological ordering is a linear sequence where every edge u -> v has u before v.
A topological ordering exists iff the directed graph is a DAG (no directed cycles).
Kahn’s algorithm repeatedly removes indegree-0 vertices; if it can’t remove all vertices, a cycle exists.
DFS-based topo sort outputs vertices in reverse finishing time (reverse postorder); back edges (to GRAY) detect cycles.
Topological sorts are generally not unique; different choices among currently-available nodes yield different valid orders.
Both major algorithms run in O(n + m) time using adjacency lists.
Topo order is a backbone for DAG dynamic programming and real-world dependency scheduling.
Using topological sort on a graph that may contain directed cycles without checking for failure (you must detect/report cycles).
Mixing up edge direction: if u -> v means “u depends on v” vs “u is prerequisite for v,” you’ll reverse the intended order.
Forgetting to decrement indegree for every outgoing edge in Kahn’s algorithm (leading to missing nodes).
In DFS topo sort, appending nodes when they are discovered rather than when they finish (you need finishing times).
Given edges: A -> C, B -> C, C -> D, B -> E. List two different valid topological orderings.
Hint: A and B must both come before C; C must come before D; B must come before E. Try starting with A vs starting with B.
Two valid orderings are:
1) A, B, C, D, E
2) B, A, E, C, D
Checks:
Run Kahn’s algorithm on: 1 -> 3, 2 -> 3, 3 -> 4, 2 -> 5, 5 -> 4. Use a FIFO queue and break ties by smaller number first. Provide the resulting order.
Hint: Compute indegrees first. Initialize Q with all indegree-0 vertices in increasing order.
Indegrees:
Q=[1,2]
Pop 1 → output [1]; update 3:2→1
Pop 2 → output [1,2]; update 3:1→0 (push 3); update 5:1→0 (push 5)
Q=[3,5]
Pop 3 → output [1,2,3]; update 4:2→1
Pop 5 → output [1,2,3,5]; update 4:1→0 (push 4)
Pop 4 → output [1,2,3,5,4]
Final order: 1, 2, 3, 5, 4
A directed graph has vertices {A, B, C, D} and edges A -> B, B -> C, C -> A, C -> D. Using DFS color logic (WHITE/GRAY/BLACK), explain where the cycle is detected.
Hint: Start DFS at A. The cycle will appear when you traverse an edge to a GRAY vertex.
Start DFS(A): A becomes GRAY.
Follow A -> B: DFS(B), B becomes GRAY.
Follow B -> C: DFS(C), C becomes GRAY.
Now explore C -> A. Vertex A is currently GRAY (still in the recursion stack), so edge C -> A is a back edge.
A back edge in a directed graph implies a directed cycle; here it reveals the cycle A -> B -> C -> A.
Therefore no topological ordering exists.
Prerequisites you’re using here:
Next nodes this unlocks:
Related concepts to explore next: