Prefix trees for string storage. O(k) operations for length k strings.
Self-serve tutorial - low prerequisites, straightforward concepts.
When you type “aut…” and your editor suggests “auto”, “autocomplete”, and “automatic”, it’s doing something very trie-like: it’s exploiting the fact that many strings share prefixes, and it wants to find “everything that starts with this prefix” quickly.
A trie (prefix tree) stores strings by sharing common prefixes along root→node paths. Each edge is labeled by one character, and nodes can be marked as “terminal” to mean “a complete key ends here”. Insert/search/delete take O(k) time for a key of length k (independent of how many total keys you stored), plus some constant factors from how you represent children (array vs map).
Hash tables are great when you want to answer: “Is this exact key present?” in about O(1) average time. But they’re not naturally built for questions like:
You can do these with hashing, but you typically need extra indexing, sorting, or scanning.
A trie is designed around prefix structure. It stores many strings such that shared prefixes are represented once, then branch when the next character differs.
A trie is a rooted tree where:
If the root represents the empty prefix "", then following edges labeled 'c' → 'a' → 't' leads you to a node representing the prefix "cat".
This last point is crucial: if you store "car" and "cart", then the node for "car" is terminal, while it also has a child edge 't'.
Operations on tries are usually measured in terms of the string length.
Let k be the length of a key. Search/insert/delete walk one character at a time. That’s k steps.
So:
This is not “always faster” than hash tables—constant factors matter—but it’s predictably proportional to key length, and it enables efficient prefix queries.
Each node needs a way to get from a character to the child node.
We’ll refer to this as:
Typical representations:
| Representation | Lookup for next char | Memory | Notes |
|---|---|---|---|
| Fixed array (e.g., 26 lowercase) | O(1) | High | Fast, but wastes space if sparse |
| Hash map / dictionary | O(1) average | Medium | Flexible alphabets, typical choice |
| Balanced BST map | O(log Σ) | Medium | Deterministic, Σ = alphabet size |
The trie’s headline O(k) assumes child lookup is O(1) (array or hash map). If you use a tree-map, you get O(k log Σ).
A trie can store:
isTerminal.value (or pointer to value).Both variants are common; the mechanics are the same.
Tries are powerful because they turn string comparison into shared traversal.
In a sorted list, to compare two strings you scan until they differ. In a trie, you physically share that scanned prefix: once a prefix exists as a path, every key with that prefix reuses it.
Suppose we insert these keys:
Start at the root (""), then:
Notice how "to" and "tea" share the node("t").
A trie node represents a prefix, but not every prefix is a stored key.
Example: insert "tea" only.
Now search for "te": you can walk 't' then 'e' and arrive at node("te"). If you didn’t have a terminal marker, you might incorrectly conclude "te" is present. Terminal markers make the answer precise.
Formally:
isTerminal = true.Let p be a prefix. If you can traverse p (character by character) to reach node(p), then:
This is the key property that hash tables don’t naturally provide.
Often, nodes store extra metadata to answer queries efficiently:
prefixCount: how many keys pass through this nodeterminalCount: how many keys end here (useful for multisets)Then:
prefixCount at node(p)This is not required for basic tries, but it’s a frequent real-world upgrade.
Let k = |s| be the number of characters.
Traversal visits exactly k edges, doing a child_map lookup at each step.
So the work is:
∑ (per-character child lookup)
If child lookup is O(1) average:
O(k)
If child lookup is O(log Σ):
O(k log Σ)
Where Σ is alphabet size (e.g., 26 lowercase, 128 ASCII, or full Unicode codepoints).
Tries often use more memory than a hash table for the same set of strings, because you store nodes and child maps.
But they compress shared prefixes. If your dataset has lots of common prefixes (URLs, file paths, dictionary words, identifiers), a trie can be surprisingly memory-competitive and faster for prefix operations.
A trie is simple conceptually, but correctness depends on a few details:
We’ll describe operations for a trie that stores a set of strings with isTerminal.
To search s, follow the unique path dictated by its characters. If at any character you can’t follow the edge, the string isn’t stored.
Insertion is just “create missing edges along the path, then mark the endpoint terminal.” If the path already exists, you only update the terminal marker.
If you delete "inn" but keep "in", you must not remove the shared prefix nodes. You only remove nodes that become unreachable from any terminal key.
A standard approach:
This is basically garbage-collecting dead suffix nodes.
During traversal, store the path:
Then after unmarking terminal, pop upward:
So delete is O(k) time.
If your alphabet is ASCII letters, an array of size 26 or 52 might be great.
If your keys are arbitrary Unicode strings, a sparse map is safer.
This choice affects constants and memory but not the core idea.
| Task | Hash table | Trie |
|---|---|---|
| Exact membership | O(1) avg | O(k) |
| Prefix query | Not natural | Natural (traverse prefix then DFS) |
| Lexicographic iteration | Requires sorting | Natural (DFS in character order) |
| Memory | Often lower | Often higher, but shares prefixes |
| Worst-case guarantees | Depends on hash | Deterministic O(k) given child lookup |
A useful mindset: a trie is an indexing structure for strings where structure matters (prefixes), not just equality.
Autocomplete is the canonical trie use-case:
To enumerate keys:
If you need “top-k suggestions,” you often store extra info at nodes (e.g., most frequent completions, or a heap of top terms).
Given a query string q, you may want the longest prefix of q that exists as a key.
Traverse characters until you can’t continue, tracking the last terminal node you saw.
This shows up in:
If each node tracks prefixCount (how many inserted strings pass through), then you can answer:
Update rule during insert:
prefixCount += 1Delete similarly decrements.
| Variant | Idea | Why it matters |
|---|---|---|
| Radix tree / compact trie | Compress chains of single-child nodes into multi-character edges | Saves memory, can be faster |
| Ternary search tree | Each node stores a character and 3 pointers (less/equal/greater) | Memory-friendly alternative |
| Bitwise trie | Alphabet is {0,1} for integers / IP prefixes | Very fast longest-prefix operations |
A key selling point is that lookup doesn’t scale with the number of stored strings n the same way many other structures do.
That said, enumeration from a prefix node can output many results; the time necessarily scales with how many keys you return.
Build a trie for the set {"to", "tea", "ten"}. Then determine whether "te" and "ten" are contained.
Start with an empty root node representing prefix "".
Initialize root.child_map = {} and root.isTerminal = false.
Insert "to":
Insert "tea":
Insert "ten":
Search "te":
Therefore "te" ∉ set.
Search "ten":
Therefore "ten" ∈ set.
Insight: Reaching a node is not enough; the terminal marker is what distinguishes a stored key from a prefix that merely exists because longer keys share it.
Store {"in", "inn", "into"}. Delete "inn" and show what remains.
Insert "in": create path root —'i'→ node("i") —'n'→ node("in"). Mark node("in") terminal.
Insert "inn": traverse to node("in"), then add 'n' to create node("inn"). Mark node("inn") terminal.
Insert "into": traverse root-'i'-'n' to node("in"); then add 't'→node("int"), 'o'→node("into"). Mark node("into") terminal.
Now node("in") is terminal and has children 'n' and 't'.
Delete "inn":
Traversal path: (root,'i'), (node("i"),'n'), (node("in"),'n') leads to node("inn").
node("inn").isTerminal is true, so proceed.
Unmark node("inn").isTerminal = false.
Now decide cleanup:
Stop cleanup at node("in") because node("in") is terminal (represents "in") and also still has child 't' for "into".
So we must keep node("in") and its ancestors.
Resulting stored keys are {"in", "into"}.
The shared prefix nodes remain intact.
Insight: Deletion is “unmark terminal, then prune dead leaves upward until you reach a node that is still needed (terminal or has other children).”
Given keys {"car", "card", "care", "cat", "dog"}, list all keys with prefix "car".
Traverse prefix "car":
root —'c'→ node("c") —'a'→ node("ca") —'r'→ node("car").
If any edge were missing, the answer set would be empty.
At node("car"), check terminal:
node("car").isTerminal = true, so include "car" in results.
Enumerate subtree from node("car"):
Stop when DFS finishes.
Collected results: ["car", "card", "care"].
Insight: Prefix search is a two-phase operation: O(|p|) to reach node(p), then output-sensitive traversal of its subtree.
A trie stores strings by sharing prefixes: each node corresponds to the prefix along the root→node path.
Edges are labeled by characters; branching happens on the next character.
A terminal marker (or stored value) is required to distinguish a complete key from a prefix node.
Search/insert/delete are O(k) in the key length k (assuming O(1) child_map lookup).
Prefix queries are natural: traverse the prefix, then enumerate the subtree.
The child_map representation (array vs hash map) controls constant factors and memory usage.
Deletion works by unmarking terminal and pruning nodes that become non-terminal leaves.
Forgetting the terminal marker and treating every reachable prefix node as a stored key (causes false positives like "te" when only "tea" exists).
Implementing delete by removing the whole path, accidentally deleting other keys that share the prefix (e.g., deleting "inn" breaks "in").
Using a fixed-size array child_map for a large or unknown alphabet (wastes memory heavily or becomes incorrect with unexpected characters).
Reporting prefix-query time as just O(|p|) and ignoring the output-sensitive subtree traversal cost.
You insert the keys {"a", "ab", "abc", "b"} into a trie. Which nodes must be terminal? List the terminal prefixes.
Hint: A node is terminal exactly when some inserted key ends at that node, even if it has children.
Terminal prefixes are: "a", "ab", "abc", and "b". Note that "a" and "ab" are terminal even though they have descendants.
Design a trie node structure for lowercase English words. Compare using (1) an array of length 26 and (2) a hash map for child_map. Give one scenario where each choice is better.
Hint: Think about sparsity: how many children does a typical node have, and how big is the alphabet?
Array(26): child lookup is true O(1) with very small constant; best when alphabet is fixed and small and performance is critical (e.g., many lookups, dense branching). Hash map: saves memory when most nodes have few children; best when the trie is sparse or alphabet can vary (e.g., mixed-case, punctuation, Unicode).
Implement (in pseudocode) delete(s) for a trie storing a set of strings with isTerminal and child_map. Your delete should not remove nodes needed by other strings.
Hint: Record the path while descending (stack of (parent, char)), then prune upward while nodes are non-terminal leaves.
One correct pseudocode approach:
function delete(s):
node = root
stack = [] // pairs (parentNode, char)
for c in s:
if c not in node.child_map: return false
stack.push((node, c))
node = node.child_map[c]
if node.isTerminal == false: return false
node.isTerminal = false
// prune upward
while stack not empty:
(parent, c) = stack.pop()
child = parent.child_map[c]
if child.isTerminal == false and child.child_map is empty:
remove parent.child_map[c]
else:
break
return true