Proof of correctness: Algorithm for diameter of a tree in graph theory
Categories:
Proof of Correctness: Algorithm for Tree Diameter in Graph Theory
Explore the algorithm for finding the diameter of a tree and delve into its formal proof of correctness, ensuring its reliability and efficiency.
The diameter of a tree is defined as the longest path between any two nodes in the tree. Finding this path is a fundamental problem in graph theory with applications in network design, phylogenetic tree analysis, and more. While several algorithms exist, a common and efficient approach involves two Breadth-First Search (BFS) traversals. This article will detail this algorithm and provide a rigorous proof of its correctness.
The Two-BFS Algorithm for Tree Diameter
The algorithm to find the diameter of a tree is surprisingly simple yet effective. It leverages the properties of trees and BFS to identify the two endpoints of the longest path. The core idea is that one endpoint of any longest path must be one of the farthest nodes from an arbitrary starting point. Once one such endpoint is found, the farthest node from it will be the other endpoint of a diameter.
flowchart TD A[Start: Pick an arbitrary node 'u'] --> B{Perform BFS from 'u'} B --> C[Find node 'x' farthest from 'u'] C --> D{Perform BFS from 'x'} D --> E[Find node 'y' farthest from 'x'] E --> F[The path between 'x' and 'y' is a diameter] F --> G[End]
Flowchart of the Two-BFS Algorithm for Tree Diameter
Let's break down the steps of the algorithm:
1. First BFS
Choose an arbitrary node u
in the tree. Perform a Breadth-First Search (BFS) starting from u
. During this BFS, keep track of the distance of each node from u
. Identify the node x
that is farthest from u
.
2. Second BFS
Perform another BFS, this time starting from node x
(the farthest node found in the first BFS). Again, track the distances of all nodes from x
. Identify the node y
that is farthest from x
.
3. Result
The path between x
and y
is a diameter of the tree, and the distance between x
and y
is the length of the diameter.
Proof of Correctness
To prove the correctness of the two-BFS algorithm, we need to show that the path found between x
and y
is indeed a longest path in the tree. The proof relies on two key lemmas.
Lemma 1: Let P
be any longest path in a tree, and let u
be an arbitrary node. If we perform a BFS from u
and find the node x
farthest from u
, then x
must be an endpoint of some longest path.
Proof of Lemma 1: Assume for contradiction that x
is not an endpoint of any longest path. Let P_max = (a, ..., b)
be a longest path in the tree. Since x
is not an endpoint of P_max
, x
must be outside P_max
or an internal node of P_max
. If x
is an internal node, then dist(u, x)
would be less than dist(u, a)
or dist(u, b)
(unless u
is also on P_max
and x
is one of its endpoints, which contradicts x
not being an endpoint of any longest path). If x
is outside P_max
, let c
be the node on P_max
closest to x
. Then the path (u, ..., x)
must intersect P_max
at some point. Consider the path (x, ..., c, ..., a)
or (x, ..., c, ..., b)
. One of these paths must be longer than (u, ..., x)
if x
is not an endpoint of a longest path, which contradicts x
being the farthest node from u
. Therefore, x
must be an endpoint of some longest path.
Lemma 2: If x
is an endpoint of a longest path P_max = (x, ..., y')
, then the node y
farthest from x
(found in the second BFS) must be y'
(the other endpoint of P_max
).
Proof of Lemma 2: Assume for contradiction that y
is not y'
. This means there exists a path (x, ..., y)
such that dist(x, y) > dist(x, y')
. But (x, ..., y')
is a longest path by definition. If dist(x, y) > dist(x, y')
, then (x, ..., y)
would be a path longer than P_max
, which contradicts P_max
being a longest path. Therefore, y
must be y'
, and the path (x, ..., y)
is indeed a longest path.
Algorithm Implementation (Pseudocode)
Here's a pseudocode representation of the algorithm. The BFS
function returns the farthest node and its distance from the source.
function BFS(graph, start_node):
distances = {} # Stores distance from start_node
queue = []
for node in graph.nodes():
distances[node] = infinity
distances[start_node] = 0
queue.append(start_node)
farthest_node = start_node
max_distance = 0
while queue is not empty:
current_node = queue.pop(0)
if distances[current_node] > max_distance:
max_distance = distances[current_node]
farthest_node = current_node
for neighbor in graph.neighbors(current_node):
if distances[neighbor] == infinity:
distances[neighbor] = distances[current_node] + 1
queue.append(neighbor)
return farthest_node, max_distance
function find_tree_diameter(graph):
# Step 1: First BFS from an arbitrary node (e.g., node 0)
node_u = graph.nodes()[0] # Assuming nodes are indexed or iterable
farthest_from_u, _ = BFS(graph, node_u)
# Step 2: Second BFS from the farthest node found
farthest_from_farthest, diameter_length = BFS(graph, farthest_from_u)
return diameter_length, farthest_from_u, farthest_from_farthest
Pseudocode for the Two-BFS Tree Diameter Algorithm