Union find implementation using Python
Categories:
Implementing the Union-Find Data Structure in Python

Explore the Union-Find (Disjoint Set Union) data structure, its core operations, and a practical Python implementation for efficient set management.
The Union-Find data structure, also known as Disjoint Set Union (DSU), is a powerful tool for managing a collection of disjoint sets. It provides two primary operations: union
(merging two sets) and find
(determining which set an element belongs to). This article will guide you through understanding its principles and implementing it efficiently in Python.
Understanding Union-Find: Core Concepts
At its heart, Union-Find represents each set as a tree, where the root of the tree is the 'representative' of that set. Each element in the set points to its parent, and the root points to itself (or has a special indicator like -1 or its own index). The two main operations are:
find(i)
: This operation determines the representative (root) of the set containing elementi
. It does this by traversing up the parent pointers until it reaches the root.union(i, j)
: This operation merges the sets containing elementsi
andj
. It first finds the representatives of bothi
andj
. If they are different, it makes one root the parent of the other, effectively merging the two trees.
flowchart TD subgraph Initial State A[1] --> 1 B[2] --> 2 C[3] --> 3 D[4] --> 4 end subgraph After Union(1, 2) E[1] --> 2 F[2] --> 2 G[3] --> 3 H[4] --> 4 end subgraph After Union(3, 4) I[1] --> 2 J[2] --> 2 K[3] --> 4 L[4] --> 4 end subgraph After Union(1, 4) M[1] --> 2 N[2] --> 2 O[3] --> 4 P[4] --> 2 end Initial --> After Union(1, 2) After Union(1, 2) --> After Union(3, 4) After Union(3, 4) --> After Union(1, 4)
Visualizing Union-Find operations: Initial state, then merging sets.
Optimizations: Path Compression and Union by Rank/Size
Naive implementations of find
and union
can lead to skewed trees, resulting in O(N)
time complexity in the worst case. To achieve near-constant time complexity, two crucial optimizations are employed:
- Path Compression (for
find
): During afind
operation, as we traverse up to the root, we can make all visited nodes point directly to the root. This flattens the tree, significantly reducing the height for futurefind
operations on those nodes. - Union by Rank or Size (for
union
): When merging two sets, we can attach the root of the smaller tree (by rank/height or size) to the root of the larger tree. This helps keep the trees balanced and prevents them from becoming too tall. Union by rank uses an estimated height, while union by size uses the actual number of elements in the set.
class UnionFind:
def __init__(self, size):
# parent[i] stores the parent of element i
# If parent[i] is negative, i is a root, and its absolute value is the size of the set.
self.parent = [-1] * size
def find(self, i):
# Path compression: If i is not the root, set its parent to the root of its set
if self.parent[i] < 0:
return i
self.parent[i] = self.find(self.parent[i])
return self.parent[i]
def union(self, i, j):
root_i = self.find(i)
root_j = self.find(j)
if root_i != root_j:
# Union by size: Attach smaller tree to root of larger tree
if abs(self.parent[root_i]) < abs(self.parent[root_j]):
root_i, root_j = root_j, root_i # Swap to ensure root_i is the larger tree
self.parent[root_i] += self.parent[root_j] # Add size of root_j's set to root_i's set
self.parent[root_j] = root_i # Make root_i the parent of root_j
return True # Union successful
return False # i and j are already in the same set
# Example Usage:
uf = UnionFind(5) # Elements 0, 1, 2, 3, 4
print(f"Initial parents: {uf.parent}") # Expected: [-1, -1, -1, -1, -1]
uf.union(0, 1)
print(f"After union(0, 1): {uf.parent}") # Expected: [-2, 0, -1, -1, -1] (or similar, depending on which root becomes parent)
uf.union(2, 3)
print(f"After union(2, 3): {uf.parent}") # Expected: [-2, 0, -2, 2, -1]
uf.union(0, 2)
print(f"After union(0, 2): {uf.parent}") # Expected: [-4, 0, 0, 2, -1] (after path compression and union by size)
print(f"Find(3): {uf.find(3)}") # Expected: 0 (after path compression)
print(f"Parents after find(3): {uf.parent}") # Expected: [-4, 0, 0, 0, -1] (3 now points directly to 0)
print(f"Are 1 and 3 connected? {uf.find(1) == uf.find(3)}") # Expected: True
print(f"Are 1 and 4 connected? {uf.find(1) == uf.find(4)}") # Expected: False
O(α(N))
, where α
is the inverse Ackermann function. This function grows extremely slowly, making it practically constant for all realistic input sizes.Applications of Union-Find
The Union-Find data structure is incredibly versatile and finds applications in various algorithms and problems:
- Graph Algorithms: Detecting cycles in undirected graphs, Kruskal's algorithm for Minimum Spanning Tree (MST).
- Network Connectivity: Determining if two nodes are connected in a network, counting connected components.
- Image Processing: Grouping pixels into connected regions.
- Percolation: Simulating the flow through a random network.
- Game Development: Managing groups of units or territories.
1. Initialize the structure
Create a parent
list where each element i
initially points to itself (or has a negative value indicating it's a root and its set size).
2. Implement find
with path compression
Recursively find the root, and during the return path, update all visited nodes to point directly to the root.
3. Implement union
with union by size/rank
Find the roots of both elements. If different, merge the smaller tree under the root of the larger tree, updating the size/rank of the new root.
4. Test with various operations
Perform a series of union
and find
operations to verify correctness and observe the effects of optimizations.