How does algorithm for Longest increasing subsequence [O(nlogn)] work?

Learn how does algorithm for longest increasing subsequence [o(nlogn)] work? with practical examples, diagrams, and best practices. Covers algorithm, correctness, lis development techniques with vi...

Demystifying the O(n log n) Longest Increasing Subsequence Algorithm

Abstract representation of an increasing sequence within a larger data set, illustrating the LIS problem.

Explore the efficient O(n log n) algorithm for finding the Longest Increasing Subsequence (LIS), understanding its core principles, data structures, and correctness proof.

The Longest Increasing Subsequence (LIS) problem is a classic computer science challenge. Given a sequence of numbers, the goal is to find the longest subsequence such that all elements of the subsequence are sorted in increasing order. While a straightforward dynamic programming approach yields an O(n^2) solution, a more advanced technique can solve it in O(n log n) time. This article will break down the O(n log n) algorithm, explain its underlying logic, and provide insights into its correctness.

Understanding the Core Idea: Maintaining Potential LIS Endings

The O(n log n) LIS algorithm doesn't explicitly construct all possible increasing subsequences. Instead, it maintains an array, let's call it tails, where tails[i] stores the smallest tail of all increasing subsequences of length i+1. This is a crucial distinction: tails[i] is not necessarily an element of the LIS itself, but rather the smallest possible value that could end an increasing subsequence of that specific length.

As we iterate through the input array, for each element x, we perform one of two operations:

If x is greater than all elements in tails: This means x can extend the longest increasing subsequence found so far. We append x to tails, effectively increasing the length of the LIS by one.
If x is not greater than all elements in tails: This means x can potentially replace an existing tail to form a new increasing subsequence of the same length, but with a smaller ending element. We find the smallest element in tails that is greater than or equal to x and replace it with x. This replacement is done using binary search (specifically, std::lower_bound in C++ or similar functions in other languages), which is why the algorithm achieves O(log n) per element, leading to an overall O(n log n) complexity.

flowchart TD
    A[Start with empty 'tails' array] --> B{Process each element 'x' in input array?}
    B -- Yes --> C{Is 'x' greater than all elements in 'tails'?}
    C -- Yes --> D[Append 'x' to 'tails']
    C -- No --> E[Find smallest element in 'tails' >= 'x' using binary search]
    E --> F[Replace that element with 'x']
    D --> B
    F --> B
    B -- No --> G[Length of 'tails' is LIS length]
    G --> H[End]

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style H fill:#f9f,stroke:#333,stroke-width:2px

Flowchart of the O(n log n) LIS algorithm's core logic.

Detailed Walkthrough with an Example

Let's trace the algorithm with an example: arr = [3, 1, 4, 1, 5, 9, 2, 6]

tails = []
Process 3: tails is empty. Append 3. tails = [3]
Process 1: 1 is not greater than 3. Smallest element >= 1 is 3. Replace 3 with 1. tails = [1]
Process 4: 4 is greater than 1. Append 4. tails = [1, 4]
Process 1: 1 is not greater than 4. Smallest element >= 1 is 1. Replace 1 with 1 (no change). tails = [1, 4]
Process 5: 5 is greater than 4. Append 5. tails = [1, 4, 5]
Process 9: 9 is greater than 5. Append 9. tails = [1, 4, 5, 9]
Process 2: 2 is not greater than 9. Smallest element >= 2 is 4. Replace 4 with 2. tails = [1, 2, 5, 9]
Process 6: 6 is not greater than 9. Smallest element >= 6 is 9. Replace 9 with 6. tails = [1, 2, 5, 6]

After processing all elements, the length of tails is 4. Thus, the LIS length is 4. Note that [1, 2, 5, 6] is indeed an increasing subsequence, but it's not necessarily the LIS. For example, [3, 4, 5, 9] is also an LIS of length 4. The tails array only guarantees the correct length, not the actual subsequence itself. If you need to reconstruct the LIS, you would need to store predecessors, which adds complexity but doesn't change the time complexity.

#include <vector>
#include <algorithm>

int lengthOfLIS(std::vector<int>& nums) {
    if (nums.empty()) {
        return 0;
    }

    std::vector<int> tails;
    tails.push_back(nums[0]);

    for (size_t i = 1; i < nums.size(); ++i) {
        int num = nums[i];
        if (num > tails.back()) {
            // If current number is greater than the largest tail,
            // it extends the longest LIS found so far.
            tails.push_back(num);
        } else {
            // Find the smallest tail that is >= num and replace it.
            // std::lower_bound returns an iterator to the first element
            // that is not less than 'num'.
            *std::lower_bound(tails.begin(), tails.end(), num) = num;
        }
    }

    return tails.size();
}

C++ implementation of the O(n log n) LIS algorithm.

💡

The tails array is always sorted in increasing order. This property is crucial for the binary search step to work correctly and efficiently. If tails were not sorted, we couldn't use binary search and would revert to O(n) search per element, making the overall algorithm O(n^2).

Proof of Correctness

The correctness of this algorithm hinges on two key invariants:

The tails array is always sorted in increasing order. This is maintained because we either append an element larger than the current last element, or we replace an element with a smaller one, ensuring the sorted property is preserved.
tails[i] stores the smallest ending element of all increasing subsequences of length i+1 found so far. Let's prove this by induction.
- Base Case: When tails has one element, tails[0], it is the smallest ending element of an LIS of length 1.
- Inductive Step: Assume tails[k] is the smallest ending element of an LIS of length k+1 for all k < i. When we process a new element x:
  - If x > tails.back(): We append x. The new tails.back() (which is x) is the smallest ending element of an LIS of length tails.size() because any other LIS of that length would have been formed by extending a shorter LIS with a smaller element, which would have already resulted in a smaller tails.back() at an earlier step.
  - If x <= tails.back(): We find tails[j] such that tails[j-1] < x <= tails[j] (or x <= tails[0] if j=0). We replace tails[j] with x. This replacement is valid because x can now form an increasing subsequence of length j+1 ending with x. Since x <= tails[j], x is now the smallest possible ending element for an increasing subsequence of length j+1. Any other increasing subsequence of length j+1 ending with a value smaller than x would have already updated tails[j] to that smaller value.

Because tails[i] always holds the smallest possible ending element for an increasing subsequence of length i+1, the final length of the tails array accurately represents the length of the Longest Increasing Subsequence.

How does algorithm for Longest increasing subsequence [O(nlogn)] work?

Tags:

Categories:

Demystifying the O(n log n) Longest Increasing Subsequence Algorithm

Understanding the Core Idea: Maintaining Potential LIS Endings

Detailed Walkthrough with an Example

Proof of Correctness