Edit Distance

Hero Image

DT

Dhaval Trivedi

Co-founder, Airtribe

LeetCode Problem: Edit Distance

LeetCode's "Edit Distance" problem is a classic algorithmic challenge that tests a user's understanding of dynamic programming. This problem, also known as the Levenshtein Distance, is essential for tasks involving string manipulation, making it a staple for those delving into algorithmic programming and competitive coding.


Problem Overview

The problem statement is as follows: Given two strings word1 and word2, find the minimum number of operations required to convert word1 into word2. You have three types of operations available:

  1. Insert a character.
  2. Delete a character.
  3. Replace a character.

This problem is a fundamental exercise in understanding and implementing dynamic programming concepts.


Understanding the Problem

To gain better clarity, let's consider an example: convert the string horse to ros.

  • Change h to r: rorse
  • Delete r: rose
  • Delete e: ros

The total number of operations is 3, making the edit distance 3.

The challenge lies in determining the optimal sequence of these operations to minimize the total count.


Key DSA Concepts and Theory

Dynamic Programming (DP)

Dynamic Programming is an optimization technique used to solve complex problems by breaking them down into simpler subproblems. It is often compared to the divide and conquer technique, but unlike that approach, DP solves each subproblem just once and stores their solutions - using a memory-based data structure (array, map, etc.).

Matrix Representation

In the context of the Edit Distance problem, the state can be represented using a 2D matrix (DP table) where dp[i][j] represents the minimum edit distance between word1 up to i characters and word2 up to j characters.

Recursive Formulation

The transition between states departs from three possibilities:

  • If the last characters are the same, no operation is needed, and we rely on the previous state: [ dp[i][j] = dp[i-1][j-1] ]
  • If they differ, we can consider:
    • Insertion: dp[i][j-1] + 1
    • Deletion: dp[i-1][j] + 1
    • Replacement: dp[i-1][j-1] + 1

The final recursion formula becomes: [ dp[i][j] = \text{min}(dp[i-1][j-1] + \text{cost}, dp[i-1][j] + 1, dp[i][j-1] + 1) ] where cost is 0 if word1[i-1] == word2[j-1], otherwise 1.


Solution Approach

Step-by-step Solution

We'll initialize a 2D matrix and iterate over each substring combination to compute minimal operations.

Initialization

  1. Base Case:
    • To convert a string of length i to an empty string, it takes i deletions.
    • To convert an empty string to a string of length j, it takes j insertions.

This establishes our base conditions:

  • dp[i][0] = i
  • dp[0][j] = j

Iterative Solution

  1. Fill the DP table using nested loops:
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (word1[i-1] == word2[j-1]) {
                dp[i][j] = dp[i-1][j-1];  // No operation required
            } else {
                dp[i][j] = min(dp[i-1][j-1] + 1, // Replacement
                               min(dp[i-1][j] + 1, // Deletion
                                   dp[i][j-1] + 1)); // Insertion
            }
        }
    }
    
  2. The final result is found in dp[m][n].

C++ Code Implementation

#include <vector>
#include <string>
#include <algorithm>
using namespace std;

int minDistance(string word1, string word2) {
    int m = word1.size();
    int n = word2.size();
    vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));

    for (int i = 1; i <= m; i++) {
        dp[i][0] = i;
    }
    for (int j = 1; j <= n; j++) {
        dp[0][j] = j;
    }

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (word1[i - 1] == word2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = min(dp[i - 1][j - 1], min(dp[i - 1][j], dp[i][j - 1])) + 1;
            }
        }
    }
    return dp[m][n];
}

Java Code Implementation

class Solution {
    public int minDistance(String word1, String word2) {
        int m = word1.length();
        int n = word2.length();
        int[][] dp = new int[m + 1][n + 1];

        for (int i = 1; i <= m; i++) {
            dp[i][0] = i;
        }
        for (int j = 1; j <= n; j++) {
            dp[0][j] = j;
        }

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    dp[i][j] = Math.min(dp[i - 1][j - 1], Math.min(dp[i - 1][j], dp[i][j - 1])) + 1;
                }
            }
        }
        return dp[m][n];
    }
}

Time and Space Complexity Analysis

Time Complexity

The solution requires getting through the entire dp table which has dimensions (m + 1) x (n + 1). Therefore, the time complexity of the solution is:

[ O(m \times n) ]

where m and n are the lengths of word1 and word2, respectively.

Space Complexity

The space complexity is also:

[ O(m \times n) ]

because we need a 2D array (or table) to store the results of subproblems for each combination of substrings.


Common Mistakes to Avoid

  1. Confusion with indices: Remember that indices i and j in dp correspond to lengths of substrings, hence need careful handling with respect to conditions.
  2. Off-by-one errors: Initialization should carefully index from 1 upwards when assigning values to initially empty strings.

Similar Problems on LeetCode

For further practice, you may consider attempting these related problems to deepen your understanding:


Additional Resources and References

To enhance your understanding of the Edit Distance problem and dynamic programming, consider exploring these resources:

  • Introduction to Algorithms by Thomas H. Cormen: Chapter on Dynamic Programming.
  • GeeksforGeeks has excellent articles on dynamic programming and edit distance.
  • Hackerrank and Codeforces provide additional competitive programming challenges to practice these concepts.

A thorough examination of this problem and these resources will enhance both your understanding and skill in applying dynamic programming strategies.