Longest Common Subsequence

Problem Overview
The Longest Common Subsequence (LCS) problem is a classical problem in computer science and dynamic programming. The problem requires finding the longest subsequence common to two given sequences (strings) where the subsequence consists of elements that appear in both sequences in the same relative order, but not necessarily consecutively.
Example:
- Input:
text1 = "abcde"
,text2 = "ace"
- Output:
3
- Explanation: The longest common subsequence is "ace" with the length of 3.
Understanding the Problem
To grasp the concept of the Longest Common Subsequence, imagine comparing two strands of alphabets. Your task is not to rearrange them but to trace letters in the same order from both strands and see how long a match can be formed. The need for this calculation arises in various fields, including bioinformatics for DNA sequence analysis, version control systems, and diff tools for comparing file differences.
Key DSA Concepts and Theory
Dynamic Programming
Dynamic Programming (DP) is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable where the optimal solution for a problem can be constructed efficiently from optimal solutions for its subproblems. The LCS problem uses DP due to overlapping subproblems and optimal substructure properties.
Subsequence vs Subarray
- Subsequence: Derived by deleting some elements of the array without changing the order of the remaining elements.
- Subarray: A contiguous part of the array or string.
In LCS, we are interested in subsequence, not subarray.
Recurrence Relation
The crux of solving the LCS problem via DP lies in understanding its recurrence relation. Define dp[i][j]
as the length of LCS of substrings text1[0...i-1]
and text2[0...j-1]
. The recurrence relation is given by:
- If
text1[i-1] == text2[j-1]
, thendp[i][j] = dp[i-1][j-1] + 1
. - Otherwise,
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
.
Initialization
For any i
, dp[i][0] = 0
and for any j
, dp[0][j] = 0
because the LCS of any string with an empty string is 0.
Solution Approach
The idea is to fill a DP table using a bottom-up approach by keeping a table to store lengths of longest common subsequence of substrings.
Step-by-Step Explanation
- Create a table
dp
with dimensions(len(text1)+1) x (len(text2)+1)
. - Initialize all cells to 0.
- Iterate over both strings:
- For matches, increment the diagonal value by 1.
- For mismatches, take the maximum from the left cell or the upper cell.
- The value at
dp[len(text1)][len(text2)]
will be the length of the LCS.
C++ Code
#include <vector>
#include <string>
#include <algorithm>
class Solution {
public:
int longestCommonSubsequence(std::string text1, std::string text2) {
int m = text1.size();
int n = text2.size();
std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (text1[i - 1] == text2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[m][n];
}
};
Java Code
class Solution {
public int longestCommonSubsequence(String text1, String text2) {
int m = text1.length();
int n = text2.length();
int[][] dp = new int[m + 1][n + 1];
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1;
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
return dp[m][n];
}
}
Time and Space Complexity Analysis
Time Complexity: The algorithm runs in O(m*n) time where
m
andn
are the lengths of the two input stringstext1
andtext2
, due to the double for-loop.Space Complexity: The space complexity is O(m*n) for storing the results in a 2D table. However, we can optimize this to O(min(m, n)) by using two one-dimensional arrays alternatively.
Common Mistakes to Avoid
- Not handling edge cases where either string is empty.
- Initializing the
dp
table incorrectly. - Misunderstanding the recurrence relation (particularly confusing subsequences with substrings).
Similar Problems on LeetCode
- 1143. Longest Common Subsequence
- 72. Edit Distance
- 712. Minimum ASCII Delete Sum for Two Strings
- 516. Longest Palindromic Subsequence
Additional Resources and References
- GeeksforGeeks - Longest Common Subsequence
- CLRS (Introduction to Algorithms) by Thomas H. Cormen for detailed DP explanation.
- Dynamic Programming Patterns
This illustrative guide should provide a comprehensive understanding of tackling the LCS problem using dynamic programming. By mastering such fundamental DP problems, one can handle a wide range of real-world scenarios effectively.