Edit Distance

LeetCode Problem: Edit Distance
LeetCode's "Edit Distance" problem is a classic algorithmic challenge that tests a user's understanding of dynamic programming. This problem, also known as the Levenshtein Distance, is essential for tasks involving string manipulation, making it a staple for those delving into algorithmic programming and competitive coding.
Problem Overview
The problem statement is as follows: Given two strings word1
and word2
, find the minimum number of operations required to convert word1
into word2
. You have three types of operations available:
- Insert a character.
- Delete a character.
- Replace a character.
This problem is a fundamental exercise in understanding and implementing dynamic programming concepts.
Understanding the Problem
To gain better clarity, let's consider an example: convert the string horse
to ros
.
- Change
h
tor
:rorse
- Delete
r
:rose
- Delete
e
:ros
The total number of operations is 3, making the edit distance 3.
The challenge lies in determining the optimal sequence of these operations to minimize the total count.
Key DSA Concepts and Theory
Dynamic Programming (DP)
Dynamic Programming is an optimization technique used to solve complex problems by breaking them down into simpler subproblems. It is often compared to the divide and conquer technique, but unlike that approach, DP solves each subproblem just once and stores their solutions - using a memory-based data structure (array, map, etc.).
Matrix Representation
In the context of the Edit Distance problem, the state can be represented using a 2D matrix (DP table) where dp[i][j]
represents the minimum edit distance between word1
up to i
characters and word2
up to j
characters.
Recursive Formulation
The transition between states departs from three possibilities:
- If the last characters are the same, no operation is needed, and we rely on the previous state: [ dp[i][j] = dp[i-1][j-1] ]
- If they differ, we can consider:
- Insertion:
dp[i][j-1] + 1
- Deletion:
dp[i-1][j] + 1
- Replacement:
dp[i-1][j-1] + 1
- Insertion:
The final recursion formula becomes:
[
dp[i][j] = \text{min}(dp[i-1][j-1] + \text{cost}, dp[i-1][j] + 1, dp[i][j-1] + 1)
]
where cost
is 0 if word1[i-1] == word2[j-1]
, otherwise 1.
Solution Approach
Step-by-step Solution
We'll initialize a 2D matrix and iterate over each substring combination to compute minimal operations.
Initialization
- Base Case:
- To convert a string of length
i
to an empty string, it takesi
deletions. - To convert an empty string to a string of length
j
, it takesj
insertions.
- To convert a string of length
This establishes our base conditions:
dp[i][0] = i
dp[0][j] = j
Iterative Solution
- Fill the DP table using nested loops:
for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (word1[i-1] == word2[j-1]) { dp[i][j] = dp[i-1][j-1]; // No operation required } else { dp[i][j] = min(dp[i-1][j-1] + 1, // Replacement min(dp[i-1][j] + 1, // Deletion dp[i][j-1] + 1)); // Insertion } } }
- The final result is found in
dp[m][n]
.
C++ Code Implementation
#include <vector>
#include <string>
#include <algorithm>
using namespace std;
int minDistance(string word1, string word2) {
int m = word1.size();
int n = word2.size();
vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));
for (int i = 1; i <= m; i++) {
dp[i][0] = i;
}
for (int j = 1; j <= n; j++) {
dp[0][j] = j;
}
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (word1[i - 1] == word2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
} else {
dp[i][j] = min(dp[i - 1][j - 1], min(dp[i - 1][j], dp[i][j - 1])) + 1;
}
}
}
return dp[m][n];
}
Java Code Implementation
class Solution {
public int minDistance(String word1, String word2) {
int m = word1.length();
int n = word2.length();
int[][] dp = new int[m + 1][n + 1];
for (int i = 1; i <= m; i++) {
dp[i][0] = i;
}
for (int j = 1; j <= n; j++) {
dp[0][j] = j;
}
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1];
} else {
dp[i][j] = Math.min(dp[i - 1][j - 1], Math.min(dp[i - 1][j], dp[i][j - 1])) + 1;
}
}
}
return dp[m][n];
}
}
Time and Space Complexity Analysis
Time Complexity
The solution requires getting through the entire dp
table which has dimensions (m + 1) x (n + 1)
. Therefore, the time complexity of the solution is:
[ O(m \times n) ]
where m
and n
are the lengths of word1
and word2
, respectively.
Space Complexity
The space complexity is also:
[ O(m \times n) ]
because we need a 2D array (or table) to store the results of subproblems for each combination of substrings.
Common Mistakes to Avoid
- Confusion with indices: Remember that indices
i
andj
indp
correspond to lengths of substrings, hence need careful handling with respect to conditions. - Off-by-one errors: Initialization should carefully index from 1 upwards when assigning values to initially empty strings.
Similar Problems on LeetCode
For further practice, you may consider attempting these related problems to deepen your understanding:
- Longest Common Subsequence - LeetCode 1143
- Delete Operation for Two Strings - LeetCode 583
Additional Resources and References
To enhance your understanding of the Edit Distance problem and dynamic programming, consider exploring these resources:
- Introduction to Algorithms by Thomas H. Cormen: Chapter on Dynamic Programming.
- GeeksforGeeks has excellent articles on dynamic programming and edit distance.
- Hackerrank and Codeforces provide additional competitive programming challenges to practice these concepts.
A thorough examination of this problem and these resources will enhance both your understanding and skill in applying dynamic programming strategies.