Longest Common Subsequence

Hero Image

Problem Overview

The Longest Common Subsequence (LCS) problem is a classical problem in computer science and dynamic programming. The problem requires finding the longest subsequence common to two given sequences (strings) where the subsequence consists of elements that appear in both sequences in the same relative order, but not necessarily consecutively.

Example:

  • Input: text1 = "abcde", text2 = "ace"
  • Output: 3
  • Explanation: The longest common subsequence is "ace" with the length of 3.

Understanding the Problem

To grasp the concept of the Longest Common Subsequence, imagine comparing two strands of alphabets. Your task is not to rearrange them but to trace letters in the same order from both strands and see how long a match can be formed. The need for this calculation arises in various fields, including bioinformatics for DNA sequence analysis, version control systems, and diff tools for comparing file differences.

Key DSA Concepts and Theory

Dynamic Programming

Dynamic Programming (DP) is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable where the optimal solution for a problem can be constructed efficiently from optimal solutions for its subproblems. The LCS problem uses DP due to overlapping subproblems and optimal substructure properties.

Subsequence vs Subarray

  • Subsequence: Derived by deleting some elements of the array without changing the order of the remaining elements.
  • Subarray: A contiguous part of the array or string.

In LCS, we are interested in subsequence, not subarray.

Recurrence Relation

The crux of solving the LCS problem via DP lies in understanding its recurrence relation. Define dp[i][j] as the length of LCS of substrings text1[0...i-1] and text2[0...j-1]. The recurrence relation is given by:

  1. If text1[i-1] == text2[j-1], then dp[i][j] = dp[i-1][j-1] + 1.
  2. Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).

Initialization

For any i, dp[i][0] = 0 and for any j, dp[0][j] = 0 because the LCS of any string with an empty string is 0.

Solution Approach

The idea is to fill a DP table using a bottom-up approach by keeping a table to store lengths of longest common subsequence of substrings.

Step-by-Step Explanation

  1. Create a table dp with dimensions (len(text1)+1) x (len(text2)+1).
  2. Initialize all cells to 0.
  3. Iterate over both strings:
    • For matches, increment the diagonal value by 1.
    • For mismatches, take the maximum from the left cell or the upper cell.
  4. The value at dp[len(text1)][len(text2)] will be the length of the LCS.

C++ Code

#include <vector>
#include <string>
#include <algorithm>

class Solution {
public:
    int longestCommonSubsequence(std::string text1, std::string text2) {
        int m = text1.size();
        int n = text2.size();
        std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
        
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (text1[i - 1] == text2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
        
        return dp[m][n];
    }
};

Java Code

class Solution {
    public int longestCommonSubsequence(String text1, String text2) {
        int m = text1.length();
        int n = text2.length();
        int[][] dp = new int[m + 1][n + 1];
        
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
        
        return dp[m][n];
    }
}

Time and Space Complexity Analysis

  • Time Complexity: The algorithm runs in O(m*n) time where m and n are the lengths of the two input strings text1 and text2, due to the double for-loop.

  • Space Complexity: The space complexity is O(m*n) for storing the results in a 2D table. However, we can optimize this to O(min(m, n)) by using two one-dimensional arrays alternatively.

Common Mistakes to Avoid

  • Not handling edge cases where either string is empty.
  • Initializing the dp table incorrectly.
  • Misunderstanding the recurrence relation (particularly confusing subsequences with substrings).

Similar Problems on LeetCode

Additional Resources and References

This illustrative guide should provide a comprehensive understanding of tackling the LCS problem using dynamic programming. By mastering such fundamental DP problems, one can handle a wide range of real-world scenarios effectively.