Union find implementation using Python

Learn union find implementation using python with practical examples, diagrams, and best practices. Covers python, list, union-find development techniques with visual explanations.

Implementing the Union-Find Data Structure in Python

Hero image for Union find implementation using Python

Explore the Union-Find (Disjoint Set Union) data structure, its core operations, and a practical Python implementation for efficient set management.

The Union-Find data structure, also known as Disjoint Set Union (DSU), is a powerful tool for managing a collection of disjoint sets. It provides two primary operations: union (merging two sets) and find (determining which set an element belongs to). This article will guide you through understanding its principles and implementing it efficiently in Python.

Understanding Union-Find: Core Concepts

At its heart, Union-Find represents each set as a tree, where the root of the tree is the 'representative' of that set. Each element in the set points to its parent, and the root points to itself (or has a special indicator like -1 or its own index). The two main operations are:

  1. find(i): This operation determines the representative (root) of the set containing element i. It does this by traversing up the parent pointers until it reaches the root.
  2. union(i, j): This operation merges the sets containing elements i and j. It first finds the representatives of both i and j. If they are different, it makes one root the parent of the other, effectively merging the two trees.
flowchart TD
    subgraph Initial State
        A[1] --> 1
        B[2] --> 2
        C[3] --> 3
        D[4] --> 4
    end

    subgraph After Union(1, 2)
        E[1] --> 2
        F[2] --> 2
        G[3] --> 3
        H[4] --> 4
    end

    subgraph After Union(3, 4)
        I[1] --> 2
        J[2] --> 2
        K[3] --> 4
        L[4] --> 4
    end

    subgraph After Union(1, 4)
        M[1] --> 2
        N[2] --> 2
        O[3] --> 4
        P[4] --> 2
    end

    Initial --> After Union(1, 2)
    After Union(1, 2) --> After Union(3, 4)
    After Union(3, 4) --> After Union(1, 4)

Visualizing Union-Find operations: Initial state, then merging sets.

Optimizations: Path Compression and Union by Rank/Size

Naive implementations of find and union can lead to skewed trees, resulting in O(N) time complexity in the worst case. To achieve near-constant time complexity, two crucial optimizations are employed:

  1. Path Compression (for find): During a find operation, as we traverse up to the root, we can make all visited nodes point directly to the root. This flattens the tree, significantly reducing the height for future find operations on those nodes.
  2. Union by Rank or Size (for union): When merging two sets, we can attach the root of the smaller tree (by rank/height or size) to the root of the larger tree. This helps keep the trees balanced and prevents them from becoming too tall. Union by rank uses an estimated height, while union by size uses the actual number of elements in the set.
class UnionFind:
    def __init__(self, size):
        # parent[i] stores the parent of element i
        # If parent[i] is negative, i is a root, and its absolute value is the size of the set.
        self.parent = [-1] * size

    def find(self, i):
        # Path compression: If i is not the root, set its parent to the root of its set
        if self.parent[i] < 0:
            return i
        self.parent[i] = self.find(self.parent[i])
        return self.parent[i]

    def union(self, i, j):
        root_i = self.find(i)
        root_j = self.find(j)

        if root_i != root_j:
            # Union by size: Attach smaller tree to root of larger tree
            if abs(self.parent[root_i]) < abs(self.parent[root_j]):
                root_i, root_j = root_j, root_i # Swap to ensure root_i is the larger tree
            
            self.parent[root_i] += self.parent[root_j] # Add size of root_j's set to root_i's set
            self.parent[root_j] = root_i # Make root_i the parent of root_j
            return True # Union successful
        return False # i and j are already in the same set

# Example Usage:
uf = UnionFind(5) # Elements 0, 1, 2, 3, 4

print(f"Initial parents: {uf.parent}") # Expected: [-1, -1, -1, -1, -1]

uf.union(0, 1)
print(f"After union(0, 1): {uf.parent}") # Expected: [-2, 0, -1, -1, -1] (or similar, depending on which root becomes parent)

uf.union(2, 3)
print(f"After union(2, 3): {uf.parent}") # Expected: [-2, 0, -2, 2, -1]

uf.union(0, 2)
print(f"After union(0, 2): {uf.parent}") # Expected: [-4, 0, 0, 2, -1] (after path compression and union by size)

print(f"Find(3): {uf.find(3)}") # Expected: 0 (after path compression)
print(f"Parents after find(3): {uf.parent}") # Expected: [-4, 0, 0, 0, -1] (3 now points directly to 0)

print(f"Are 1 and 3 connected? {uf.find(1) == uf.find(3)}") # Expected: True
print(f"Are 1 and 4 connected? {uf.find(1) == uf.find(4)}") # Expected: False

Applications of Union-Find

The Union-Find data structure is incredibly versatile and finds applications in various algorithms and problems:

  • Graph Algorithms: Detecting cycles in undirected graphs, Kruskal's algorithm for Minimum Spanning Tree (MST).
  • Network Connectivity: Determining if two nodes are connected in a network, counting connected components.
  • Image Processing: Grouping pixels into connected regions.
  • Percolation: Simulating the flow through a random network.
  • Game Development: Managing groups of units or territories.

1. Initialize the structure

Create a parent list where each element i initially points to itself (or has a negative value indicating it's a root and its set size).

2. Implement find with path compression

Recursively find the root, and during the return path, update all visited nodes to point directly to the root.

3. Implement union with union by size/rank

Find the roots of both elements. If different, merge the smaller tree under the root of the larger tree, updating the size/rank of the new root.

4. Test with various operations

Perform a series of union and find operations to verify correctness and observe the effects of optimizations.