The two-operation structure
Union-Find (also Disjoint Set Union, DSU) tracks a collection of items partitioned into non-overlapping groups, and supports exactly two operations — both, after optimizations, in effectively O(1):
find(x)— which group is x in?union(x, y)— merge x's and y's groups.
That's the entire API. Its question is "are these two things in the same
group?" — find(x) == find(y) — and an astonishing number of problems
are secretly that question:
- Are these two computers connected, given these network links?
- Do these two people share a friend circle?
- Does adding this road create a cycle (i.e., were its endpoints already connected)?
- How many islands / clusters / connected components exist?
BFS/DFS can answer connectivity too — but they re-explore the world per query. Union-Find shines when connections arrive incrementally and you ask connectivity between additions: it maintains the answer as it goes.
How it works: forests of pointers
Each group is a tree (not the BST kind — no order, just "who's my parent"), and the root represents the group. Every item holds a parent pointer; roots point to themselves:
groups {0,1,2} and {3,4}: parent array: [0, 0, 0, 3, 3]
0 3 find(2): 2 → 0. root = 0
/ \ | find(4): 4 → 3. root = 3
1 2 4 same group? 0 ≠ 3 → no
union(2,4): root 3 now points to root 0
class UnionFind:
def __init__(self, n):
self.parent = list(range(n)) # everyone is their own root
self.rank = [0] * n # tree-height estimate, for balance
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # ① path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False # already connected — useful signal!
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx # ② union by rank: short tree under tall
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
return True
// Java — same structure, iterative find
int find(int x) {
while (parent[x] != x) {
parent[x] = parent[parent[x]]; // path halving — same effect
x = parent[x];
}
return x;
}
// C++
int find(int x) { return parent[x] == x ? x : parent[x] = find(parent[x]); }
The two optimizations (and the magic complexity)
Naively, trees can degenerate into chains — find becomes O(n), the unbalanced-BST disease again. Two one-line cures:
- Path compression — while finding the root, repoint every visited node directly at it. The tree flattens itself with use.
- Union by rank/size — always hang the shorter tree under the taller one, so height grows only when equals merge.
Together: amortized O(α(n)) per operation, where α is the inverse Ackermann function — ≤ 4 for any n that fits in this universe. Say "effectively constant time" and you're correct at any interview level; naming α(n) is the flourish.
Watch it run
Every item carries one arrow to its "boss"; follow arrows up and you find the group's leader. Watch unions merge trees with a single re-pointed arrow — and watch path compression flatten a chain the first time it's walked, so it never needs walking again.
1/258 items, 8 groups: everyone starts as their own leader (each arrow points to itself).
The signature move: cycle detection
union returning False means "these were already connected" — so adding
that edge would close a cycle. That single bit powers the famous
applications:
def has_cycle(n, edges):
uf = UnionFind(n)
for a, b in edges:
if not uf.union(a, b): # already connected → this edge closes a loop
return True
return False
- Kruskal's minimum spanning tree (Level 3 greedy): sort edges by weight, take each edge unless it forms a cycle — Union-Find is that check. Cheapest network connecting all cities, in 15 lines total.
- Counting components: start with n groups; every successful union decrements the count. "Number of islands II" (land appears cell by cell) is exactly this, and is brutal without DSU.
- Accounts merge / friend circles / equation satisfiability — anything where equivalences trickle in and must be consolidated.
Production perspective
- Network/infra tooling uses DSU for reachability under churn — clusters merging as links come up.
- Image processing: connected-component labeling (which pixels form one blob) — DSU over the pixel grid.
- Compilers & type checkers: unification of type variables — the "equations arrive incrementally" pattern precisely.
- The deeper lesson it teaches: representative elements. Many systems pick one canonical member to stand for an equivalence class (leader election, canonical URLs, dedup keys) — DSU is that idea as a data structure.
Common mistakes
- Skipping both optimizations — correct but O(n) per op; on 10⁵ unions, the difference between instant and timeout.
- Comparing items instead of roots —
parent[x] == parent[y]is wrong (parents may be mid-tree); always comparefind(x) == find(y). - Union-ing the items, not the roots —
parent[y] = xinstead ofparent[ry] = rxcorrupts the forest invariant. - Forgetting it can't un-merge. DSU has no
split— deletions/ disconnections break it (offline tricks exist; out of scope — but know the limitation before proposing DSU in design discussions). - Strings/objects as items — map them to indices first (a dict: name → id), or use a parent-dict directly.
Interview perspective
Practice
- Build: implement UnionFind from memory with both optimizations —
it's ~15 lines. Test: 10 items, union pairs, verify groups; check that
unionof an already-joined pair returns False. - Components: n computers and a list of cables — how many isolated clusters remain? (Count = n − successful unions.) Then: what's the minimum number of new cables to connect everything?
- Cycle: given a tree-plus-one-extra-edge (LeetCode "Redundant Connection"), find the edge to remove — pure DSU.
- Capstone: Kruskal's MST on a small weighted graph by hand — sort edges, take-unless-cycle — then code it. You've now combined sorting, greedy (Level 3), and DSU into a real algorithm.
That completes the Level 2 data structures. Heaps and graphs round out the structures; then the Level 3 patterns start composing them into solutions.
Practice — climb the ladder
The trigger: dynamic connectivity — "are these in the same group?" while edges keep arriving. If edges are fixed, plain DFS also works; say that.
Climb in order — every rung assumes the one above it. Solve on LeetCode, then tick it here; progress is saved on this device.
Warm-up
the API doing its one jobunion(u,v) every edge, then one find(a)==find(b) — connectivity in five lines.
Core
components, cycles, and merging by key- Number of ProvincesMedium
Count components — start with n, every successful union decrements.
- Redundant ConnectionMedium
Cycle detection — the first edge whose endpoints are ALREADY connected.
- Accounts MergeMedium
Union by shared email — non-obvious modeling, the real interview skill.
Stretch
union-find in disguiseUnion the == constraints, then check every != — constraints as connectivity.
Rows and columns as nodes — answer = stones − components; modeling at full stretch.