Language Selection
Choose your preferred programming language
Showing: Python
Implement strStr (Find Substring)
Problem Statement
Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
This is similar to the strstr() function in C.
Examples
Example 1:
Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0.
Example 2:
Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode".
Example 3:
Input: haystack = "hello", needle = "ll"
Output: 2
Explanation: "ll" occurs at index 2.
Constraints
1 <= haystack.length, needle.length <= 10^4haystackandneedleconsist of only lowercase English characters
Approach 1: Brute Force (Two Pointers)
Algorithm
Use two pointers to compare characters of needle with haystack at each possible starting position.
Steps:
- For each possible starting position in haystack
- Compare characters of needle with haystack
- Return index if all characters match
- Return -1 if no match found
Implementation
Python:
def str_str(haystack, needle):
"""
Find first occurrence of needle in haystack using brute force
Time: O(n*m)
Space: O(1)
"""
if not needle:
return 0
for i in range(len(haystack) - len(needle) + 1):
if haystack[i:i+len(needle)] == needle:
return i
return -1
Java:
class Solution {
/**
* Find first occurrence of needle in haystack using brute force
* Time: O(n*m)
* Space: O(1)
*/
public int strStr(String haystack, String needle) {
if (needle.isEmpty()) {
return 0;
}
for (int i = 0; i <= haystack.length() - needle.length(); i++) {
if (haystack.substring(i, i + needle.length()).equals(needle)) {
return i;
}
}
return -1;
}
}
Go:
// strStr - Find first occurrence of needle in haystack using brute force
// Time: O(n*m)
// Space: O(1)
func strStr(haystack, needle string) int {
if len(needle) == 0 {
return 0
}
for i := 0; i <= len(haystack)-len(needle); i++ {
if haystack[i:i+len(needle)] == needle {
return i
}
}
return -1
}
JavaScript:
/**
* Find first occurrence of needle in haystack using brute force
* Time: O(n*m)
* Space: O(1)
*/
function strStr(haystack, needle) {
if (needle.length === 0) {
return 0;
}
for (let i = 0; i <= haystack.length - needle.length; i++) {
if (haystack.substring(i, i + needle.length) === needle) {
return i;
}
}
return -1;
}
C#:
public class Solution {
/// <summary>
/// Find first occurrence of needle in haystack using brute force
/// Time: O(n*m)
/// Space: O(1)
/// </summary>
public int StrStr(string haystack, string needle) {
if (string.IsNullOrEmpty(needle)) {
return 0;
}
for (int i = 0; i <= haystack.Length - needle.Length; i++) {
if (haystack.Substring(i, needle.Length) == needle) {
return i;
}
}
return -1;
}
}
Complexity Analysis
- Time Complexity: O(n*m) where n and m are lengths of haystack and needle
- Space Complexity: O(1)
Approach 2: KMP Algorithm (Optimized)
Algorithm
Use the Knuth-Morris-Pratt algorithm with a failure function to avoid unnecessary comparisons.
Steps:
- Build failure function (LPS array) for the needle
- Use two pointers to match characters
- When mismatch occurs, use failure function to skip characters
- Return index when complete match found
Implementation
Python:
def str_str_kmp(haystack, needle):
"""
Find first occurrence using KMP algorithm
Time: O(n+m)
Space: O(m)
"""
if not needle:
return 0
# Build failure function (LPS array)
lps = build_lps(needle)
i = j = 0 # i for haystack, j for needle
while i < len(haystack):
if haystack[i] == needle[j]:
i += 1
j += 1
if j == len(needle):
return i - j
elif i < len(haystack) and haystack[i] != needle[j]:
if j != 0:
j = lps[j-1]
else:
i += 1
return -1
def build_lps(pattern):
"""Build Longest Proper Prefix which is also Suffix array"""
lps = [0] * len(pattern)
length = 0
i = 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length-1]
else:
lps[i] = 0
i += 1
return lps
Java:
class Solution {
/**
* Find first occurrence using KMP algorithm
* Time: O(n+m)
* Space: O(m)
*/
public int strStrKMP(String haystack, String needle) {
if (needle.isEmpty()) {
return 0;
}
int[] lps = buildLPS(needle);
int i = 0, j = 0; // i for haystack, j for needle
while (i < haystack.length()) {
if (haystack.charAt(i) == needle.charAt(j)) {
i++;
j++;
}
if (j == needle.length()) {
return i - j;
} else if (i < haystack.length() && haystack.charAt(i) != needle.charAt(j)) {
if (j != 0) {
j = lps[j-1];
} else {
i++;
}
}
}
return -1;
}
private int[] buildLPS(String pattern) {
int[] lps = new int[pattern.length()];
int length = 0;
int i = 1;
while (i < pattern.length()) {
if (pattern.charAt(i) == pattern.charAt(length)) {
length++;
lps[i] = length;
i++;
} else {
if (length != 0) {
length = lps[length-1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
}
Complexity Analysis
- Time Complexity: O(n+m) where n and m are lengths of haystack and needle
- Space Complexity: O(m) for the LPS array
Key Insights
- Brute Force: Simple but inefficient O(n*m) approach
- KMP Algorithm: Optimized O(n+m) approach using failure function
- LPS Array: Longest Proper Prefix which is also Suffix array
- Pattern Matching: Avoid unnecessary character comparisons
- Edge Cases: Handle empty needle and single character cases
Edge Cases
- Empty needle: Return 0 (needle found at start)
- Needle longer than haystack: Return -1
- Single character: Both haystack and needle have one character
- No match: Needle not found in haystack
- Multiple matches: Return index of first occurrence
Test Cases
# Test cases
assert str_str("sadbutsad", "sad") == 0
assert str_str("leetcode", "leeto") == -1
assert str_str("hello", "ll") == 2
assert str_str("", "") == 0
assert str_str("a", "a") == 0
assert str_str("abc", "c") == 2
Follow-up Questions
- Find All Occurrences: Find all indices where needle occurs
- Pattern Matching: Implement regex pattern matching
- Rabin-Karp: Use rolling hash for string matching
- Boyer-Moore: Implement Boyer-Moore string matching
- Suffix Array: Build suffix array for efficient string operations
Common Mistakes
- Index bounds: Going out of bounds when checking substrings
- Empty string handling: Not properly handling empty needle
- Off-by-one errors: Incorrect loop bounds
- KMP implementation: Incorrect LPS array construction
- Return value: Returning wrong index or not handling -1 case
Interview Tips
- Start simple: Begin with brute force approach
- Optimize when needed: Mention KMP for better performance
- Handle edge cases: Always consider empty strings
- Explain KMP: If implementing KMP, explain the LPS array concept
- Discuss trade-offs: Time vs space complexity considerations
Variations
- Case-insensitive: Ignore case differences
- Multiple patterns: Find multiple patterns simultaneously
- Approximate matching: Allow some mismatches
- Circular string: Handle circular string matching
- Unicode support: Handle Unicode characters properly