Implement strStr (Find Substring)

Find the first occurrence of a substring in a string, similar to strstr() function.

Language Selection

Choose your preferred programming language

Showing: Python

Implement strStr (Find Substring)

Problem Statement

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

This is similar to the strstr() function in C.

Examples

Example 1:

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0.

Example 2:

Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode".

Example 3:

Input: haystack = "hello", needle = "ll"
Output: 2
Explanation: "ll" occurs at index 2.

Constraints

  • 1 <= haystack.length, needle.length <= 10^4
  • haystack and needle consist of only lowercase English characters

Approach 1: Brute Force (Two Pointers)

Algorithm

Use two pointers to compare characters of needle with haystack at each possible starting position.

Steps:

  1. For each possible starting position in haystack
  2. Compare characters of needle with haystack
  3. Return index if all characters match
  4. Return -1 if no match found

Implementation

Python:

def str_str(haystack, needle):
    """
    Find first occurrence of needle in haystack using brute force
    Time: O(n*m)
    Space: O(1)
    """
    if not needle:
        return 0
    
    for i in range(len(haystack) - len(needle) + 1):
        if haystack[i:i+len(needle)] == needle:
            return i
    
    return -1

Java:

class Solution {
    /**
     * Find first occurrence of needle in haystack using brute force
     * Time: O(n*m)
     * Space: O(1)
     */
    public int strStr(String haystack, String needle) {
        if (needle.isEmpty()) {
            return 0;
        }
        
        for (int i = 0; i <= haystack.length() - needle.length(); i++) {
            if (haystack.substring(i, i + needle.length()).equals(needle)) {
                return i;
            }
        }
        
        return -1;
    }
}

Go:

// strStr - Find first occurrence of needle in haystack using brute force
// Time: O(n*m)
// Space: O(1)
func strStr(haystack, needle string) int {
    if len(needle) == 0 {
        return 0
    }
    
    for i := 0; i <= len(haystack)-len(needle); i++ {
        if haystack[i:i+len(needle)] == needle {
            return i
        }
    }
    
    return -1
}

JavaScript:

/**
 * Find first occurrence of needle in haystack using brute force
 * Time: O(n*m)
 * Space: O(1)
 */
function strStr(haystack, needle) {
    if (needle.length === 0) {
        return 0;
    }
    
    for (let i = 0; i <= haystack.length - needle.length; i++) {
        if (haystack.substring(i, i + needle.length) === needle) {
            return i;
        }
    }
    
    return -1;
}

C#:

public class Solution {
    /// <summary>
    /// Find first occurrence of needle in haystack using brute force
    /// Time: O(n*m)
    /// Space: O(1)
    /// </summary>
    public int StrStr(string haystack, string needle) {
        if (string.IsNullOrEmpty(needle)) {
            return 0;
        }
        
        for (int i = 0; i <= haystack.Length - needle.Length; i++) {
            if (haystack.Substring(i, needle.Length) == needle) {
                return i;
            }
        }
        
        return -1;
    }
}

Complexity Analysis

  • Time Complexity: O(n*m) where n and m are lengths of haystack and needle
  • Space Complexity: O(1)

Approach 2: KMP Algorithm (Optimized)

Algorithm

Use the Knuth-Morris-Pratt algorithm with a failure function to avoid unnecessary comparisons.

Steps:

  1. Build failure function (LPS array) for the needle
  2. Use two pointers to match characters
  3. When mismatch occurs, use failure function to skip characters
  4. Return index when complete match found

Implementation

Python:

def str_str_kmp(haystack, needle):
    """
    Find first occurrence using KMP algorithm
    Time: O(n+m)
    Space: O(m)
    """
    if not needle:
        return 0
    
    # Build failure function (LPS array)
    lps = build_lps(needle)
    
    i = j = 0  # i for haystack, j for needle
    
    while i < len(haystack):
        if haystack[i] == needle[j]:
            i += 1
            j += 1
        
        if j == len(needle):
            return i - j
        elif i < len(haystack) and haystack[i] != needle[j]:
            if j != 0:
                j = lps[j-1]
            else:
                i += 1
    
    return -1

def build_lps(pattern):
    """Build Longest Proper Prefix which is also Suffix array"""
    lps = [0] * len(pattern)
    length = 0
    i = 1
    
    while i < len(pattern):
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length-1]
            else:
                lps[i] = 0
                i += 1
    
    return lps

Java:

class Solution {
    /**
     * Find first occurrence using KMP algorithm
     * Time: O(n+m)
     * Space: O(m)
     */
    public int strStrKMP(String haystack, String needle) {
        if (needle.isEmpty()) {
            return 0;
        }
        
        int[] lps = buildLPS(needle);
        int i = 0, j = 0;  // i for haystack, j for needle
        
        while (i < haystack.length()) {
            if (haystack.charAt(i) == needle.charAt(j)) {
                i++;
                j++;
            }
            
            if (j == needle.length()) {
                return i - j;
            } else if (i < haystack.length() && haystack.charAt(i) != needle.charAt(j)) {
                if (j != 0) {
                    j = lps[j-1];
                } else {
                    i++;
                }
            }
        }
        
        return -1;
    }
    
    private int[] buildLPS(String pattern) {
        int[] lps = new int[pattern.length()];
        int length = 0;
        int i = 1;
        
        while (i < pattern.length()) {
            if (pattern.charAt(i) == pattern.charAt(length)) {
                length++;
                lps[i] = length;
                i++;
            } else {
                if (length != 0) {
                    length = lps[length-1];
                } else {
                    lps[i] = 0;
                    i++;
                }
            }
        }
        
        return lps;
    }
}

Complexity Analysis

  • Time Complexity: O(n+m) where n and m are lengths of haystack and needle
  • Space Complexity: O(m) for the LPS array

Key Insights

  1. Brute Force: Simple but inefficient O(n*m) approach
  2. KMP Algorithm: Optimized O(n+m) approach using failure function
  3. LPS Array: Longest Proper Prefix which is also Suffix array
  4. Pattern Matching: Avoid unnecessary character comparisons
  5. Edge Cases: Handle empty needle and single character cases

Edge Cases

  1. Empty needle: Return 0 (needle found at start)
  2. Needle longer than haystack: Return -1
  3. Single character: Both haystack and needle have one character
  4. No match: Needle not found in haystack
  5. Multiple matches: Return index of first occurrence

Test Cases

# Test cases
assert str_str("sadbutsad", "sad") == 0
assert str_str("leetcode", "leeto") == -1
assert str_str("hello", "ll") == 2
assert str_str("", "") == 0
assert str_str("a", "a") == 0
assert str_str("abc", "c") == 2

Follow-up Questions

  1. Find All Occurrences: Find all indices where needle occurs
  2. Pattern Matching: Implement regex pattern matching
  3. Rabin-Karp: Use rolling hash for string matching
  4. Boyer-Moore: Implement Boyer-Moore string matching
  5. Suffix Array: Build suffix array for efficient string operations

Common Mistakes

  1. Index bounds: Going out of bounds when checking substrings
  2. Empty string handling: Not properly handling empty needle
  3. Off-by-one errors: Incorrect loop bounds
  4. KMP implementation: Incorrect LPS array construction
  5. Return value: Returning wrong index or not handling -1 case

Interview Tips

  1. Start simple: Begin with brute force approach
  2. Optimize when needed: Mention KMP for better performance
  3. Handle edge cases: Always consider empty strings
  4. Explain KMP: If implementing KMP, explain the LPS array concept
  5. Discuss trade-offs: Time vs space complexity considerations

Variations

  1. Case-insensitive: Ignore case differences
  2. Multiple patterns: Find multiple patterns simultaneously
  3. Approximate matching: Allow some mismatches
  4. Circular string: Handle circular string matching
  5. Unicode support: Handle Unicode characters properly