Index of the First Occurrence in a String (using KMP algo / LPS(pi) array)

Hero Image

DT

Dhaval Trivedi

Co-founder, Airtribe

Problem Overview

The problem at hand is the classic "Implement strStr()" which is a common question that tests understanding of string manipulation and searching algorithms. The challenge is to implement a function that finds the needle (substring) in the haystack (string) and returns the index of its first occurrence. It’s essentially the same as using the indexOf method provided in most programming languages.

URL: Implement strStr() on LeetCode

Understanding the Problem

Given two strings, needle and haystack, the goal is to return the index of the first occurrence of needle within haystack. If needle is an empty string, return 0. If needle does not occur in haystack, return -1.

  • Example 1:

    Input: haystack = "hello", needle = "ll"
    Output: 2
    
  • Example 2:

    Input: haystack = "aaaaa", needle = "bba"
    Output: -1
    

Key DSA Concepts and Theory

1. String Manipulation

Understanding how strings work in terms of indexes and operations like iteration, slicing, or accessing substrings is crucial.

2. Searching Algorithms

The problem can be tackled using a simple linear scan (brute force), but more efficient string-searching algorithms like the Knuth-Morris-Pratt (KMP) algorithm could be employed for better performance.

3. Edge Cases

Consider cases where needle is an empty string, which by definition should return 0, or when needle's length is greater than haystack.

Solution Approach

The problem can be solved using a straightforward linear scan approach. Although simple, it is effective unless optimization is necessary.

Step-by-Step Approach

Linear Scan

  1. Initialization:

    • First, handle the special case where needle is an empty string. In such cases, directly return 0.
  2. Iterate Through Haystack:

    • Start from the very first character of haystack and take substrings of the same length as needle.
  3. Compare Substrings:

    • For each substring, compare it to needle. If they match, return the starting index of this substring.
  4. Return Result:

    • If the loop completes without finding needle, return -1.

Code Implementation

C++

class Solution {
public:
    int strStr(string haystack, string needle) {
        if (needle.empty()) return 0;  // Special case

        int m = haystack.length();
        int n = needle.length();
        
        for (int i = 0; i <= m - n; ++i) {
            if (haystack.substr(i, n) == needle) {
                return i;
            }
        }
        return -1;
    }
};

Java

class Solution {
    public int strStr(String haystack, String needle) {
        if (needle.isEmpty()) return 0;  // Special case

        int m = haystack.length();
        int n = needle.length();

        for (int i = 0; i <= m - n; i++) {
            if (haystack.substring(i, i + n).equals(needle)) {
                return i;
            }
        }
        return -1;
    }
}

Time and Space Complexity Analysis

  • Time Complexity: O(m * n), where m is the length of haystack and n is the length of needle. In the worst case, for each character in haystack, we might compare up to n characters of needle.
  • Space Complexity: O(1), excluding the space required to store input or output. The use of additional space is constant.

Common Mistakes to Avoid

  • Forgetting to handle the empty needle case at the beginning can lead to incorrect results.
  • Not checking the bounds properly, especially when taking substrings, which might lead to IndexOutOfBounds errors, especially in languages like Java.

Similar Problems on LeetCode

Other questions that involve string searching and manipulation:

  • LeetCode 796: Rotate String
  • LeetCode 28: Find the Index of the First Occurrence in a String (variation of the current problem)

Additional Resources and References

This article delves into understanding and effectively implementing the "Implement strStr()" problem, ensuring a deep grasp of string manipulation and search algorithms, potentially broadening your understanding for more advanced algorithmic challenges.