How does algorithm for Longest increasing subsequence [O(nlogn)] work?
Categories:
Demystifying the O(n log n) Longest Increasing Subsequence Algorithm
![Hero image for How does algorithm for Longest increasing subsequence [O(nlogn)] work?](/img/d69d2f3c-hero.webp)
Explore the efficient O(n log n) algorithm for finding the Longest Increasing Subsequence (LIS), understanding its core principles, data structures, and correctness proof.
The Longest Increasing Subsequence (LIS) problem is a classic computer science challenge. Given a sequence of numbers, the goal is to find the longest subsequence such that all elements of the subsequence are sorted in increasing order. While a straightforward dynamic programming approach yields an O(n^2) solution, a more advanced technique can solve it in O(n log n) time. This article will break down the O(n log n) algorithm, explain its underlying logic, and provide insights into its correctness.
Understanding the Core Idea: Maintaining Potential LIS Endings
The O(n log n) LIS algorithm doesn't explicitly construct all possible increasing subsequences. Instead, it maintains an array, let's call it tails
, where tails[i]
stores the smallest tail of all increasing subsequences of length i+1
. This is a crucial distinction: tails[i]
is not necessarily an element of the LIS itself, but rather the smallest possible value that could end an increasing subsequence of that specific length.
As we iterate through the input array, for each element x
, we perform one of two operations:
- If
x
is greater than all elements intails
: This meansx
can extend the longest increasing subsequence found so far. We appendx
totails
, effectively increasing the length of the LIS by one. - If
x
is not greater than all elements intails
: This meansx
can potentially replace an existing tail to form a new increasing subsequence of the same length, but with a smaller ending element. We find the smallest element intails
that is greater than or equal tox
and replace it withx
. This replacement is done using binary search (specifically,std::lower_bound
in C++ or similar functions in other languages), which is why the algorithm achieves O(log n) per element, leading to an overall O(n log n) complexity.
flowchart TD A[Start with empty 'tails' array] --> B{Process each element 'x' in input array?} B -- Yes --> C{Is 'x' greater than all elements in 'tails'?} C -- Yes --> D[Append 'x' to 'tails'] C -- No --> E[Find smallest element in 'tails' >= 'x' using binary search] E --> F[Replace that element with 'x'] D --> B F --> B B -- No --> G[Length of 'tails' is LIS length] G --> H[End] style A fill:#f9f,stroke:#333,stroke-width:2px style H fill:#f9f,stroke:#333,stroke-width:2px
Flowchart of the O(n log n) LIS algorithm's core logic.
Detailed Walkthrough with an Example
Let's trace the algorithm with an example: arr = [3, 1, 4, 1, 5, 9, 2, 6]
tails = []
- Process
3
:tails
is empty. Append3
.tails = [3]
- Process
1
:1
is not greater than3
. Smallest element>= 1
is3
. Replace3
with1
.tails = [1]
- Process
4
:4
is greater than1
. Append4
.tails = [1, 4]
- Process
1
:1
is not greater than4
. Smallest element>= 1
is1
. Replace1
with1
(no change).tails = [1, 4]
- Process
5
:5
is greater than4
. Append5
.tails = [1, 4, 5]
- Process
9
:9
is greater than5
. Append9
.tails = [1, 4, 5, 9]
- Process
2
:2
is not greater than9
. Smallest element>= 2
is4
. Replace4
with2
.tails = [1, 2, 5, 9]
- Process
6
:6
is not greater than9
. Smallest element>= 6
is9
. Replace9
with6
.tails = [1, 2, 5, 6]
After processing all elements, the length of tails
is 4
. Thus, the LIS length is 4
. Note that [1, 2, 5, 6]
is indeed an increasing subsequence, but it's not necessarily the LIS. For example, [3, 4, 5, 9]
is also an LIS of length 4. The tails
array only guarantees the correct length, not the actual subsequence itself. If you need to reconstruct the LIS, you would need to store predecessors, which adds complexity but doesn't change the time complexity.
#include <vector>
#include <algorithm>
int lengthOfLIS(std::vector<int>& nums) {
if (nums.empty()) {
return 0;
}
std::vector<int> tails;
tails.push_back(nums[0]);
for (size_t i = 1; i < nums.size(); ++i) {
int num = nums[i];
if (num > tails.back()) {
// If current number is greater than the largest tail,
// it extends the longest LIS found so far.
tails.push_back(num);
} else {
// Find the smallest tail that is >= num and replace it.
// std::lower_bound returns an iterator to the first element
// that is not less than 'num'.
*std::lower_bound(tails.begin(), tails.end(), num) = num;
}
}
return tails.size();
}
C++ implementation of the O(n log n) LIS algorithm.
tails
array is always sorted in increasing order. This property is crucial for the binary search step to work correctly and efficiently. If tails
were not sorted, we couldn't use binary search and would revert to O(n) search per element, making the overall algorithm O(n^2).Proof of Correctness
The correctness of this algorithm hinges on two key invariants:
- The
tails
array is always sorted in increasing order. This is maintained because we either append an element larger than the current last element, or we replace an element with a smaller one, ensuring the sorted property is preserved. tails[i]
stores the smallest ending element of all increasing subsequences of lengthi+1
found so far. Let's prove this by induction.- Base Case: When
tails
has one element,tails[0]
, it is the smallest ending element of an LIS of length 1. - Inductive Step: Assume
tails[k]
is the smallest ending element of an LIS of lengthk+1
for allk < i
. When we process a new elementx
:- If
x > tails.back()
: We appendx
. The newtails.back()
(which isx
) is the smallest ending element of an LIS of lengthtails.size()
because any other LIS of that length would have been formed by extending a shorter LIS with a smaller element, which would have already resulted in a smallertails.back()
at an earlier step. - If
x <= tails.back()
: We findtails[j]
such thattails[j-1] < x <= tails[j]
(orx <= tails[0]
ifj=0
). We replacetails[j]
withx
. This replacement is valid becausex
can now form an increasing subsequence of lengthj+1
ending withx
. Sincex <= tails[j]
,x
is now the smallest possible ending element for an increasing subsequence of lengthj+1
. Any other increasing subsequence of lengthj+1
ending with a value smaller thanx
would have already updatedtails[j]
to that smaller value.
- If
- Base Case: When
Because tails[i]
always holds the smallest possible ending element for an increasing subsequence of length i+1
, the final length of the tails
array accurately represents the length of the Longest Increasing Subsequence.