Regular Expression Matching

Implement regular expression matching with support for '.' and '*'

Language Selection

Choose your preferred programming language

Showing: Python

Regular Expression Matching

Problem Statement

Given an input string s and a pattern p, implement regular expression matching with support for ‘.’ and ‘*’ where:

  • ‘.’ Matches any single character
  • ‘*’ Matches zero or more of the preceding element

The matching should cover the entire input string (not partial).

Input/Output Specifications

  • Input: Two strings s and p where 1 <= s.length <= 20, 1 <= p.length <= 30
  • Output: A boolean indicating whether the pattern matches the string

Constraints

  • 1 <= s.length <= 20
  • 1 <= p.length <= 30
  • s contains only lowercase English letters
  • p contains only lowercase English letters, ‘.’, and ‘*’

Examples

Example 1:

Input: s = "aa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'.

Example 3:

Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Solution Approaches

Approach 1: Dynamic Programming (Bottom-Up) - Optimal

Algorithm Explanation:

  1. Create a 2D DP table where dp[i][j] represents if s[0:i] matches p[0:j]
  2. Handle base cases for empty string and pattern
  3. For each character, check if it matches and handle ‘*’ cases
  4. ‘*’ can match zero or more of the preceding character

Implementation:

Python:

def isMatch(s, p):
    """
    Regular expression matching using dynamic programming
    Time: O(m * n)
    Space: O(m * n)
    """
    m, n = len(s), len(p)
    
    # dp[i][j] represents if s[0:i] matches p[0:j]
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    
    # Base case: empty string matches empty pattern
    dp[0][0] = True
    
    # Handle patterns like a*, a*b*, a*b*c*
    for j in range(2, n + 1):
        if p[j - 1] == '*':
            dp[0][j] = dp[0][j - 2]
    
    # Fill the DP table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j - 1] == '*':
                # Case 1: '*' matches zero characters
                dp[i][j] = dp[i][j - 2]
                
                # Case 2: '*' matches one or more characters
                if matches(s, p, i - 1, j - 2):
                    dp[i][j] = dp[i][j] or dp[i - 1][j]
            else:
                # Direct character match
                if matches(s, p, i - 1, j - 1):
                    dp[i][j] = dp[i - 1][j - 1]
    
    return dp[m][n]

def matches(s, p, i, j):
    """Check if s[i] matches p[j]"""
    if j < 0:
        return False
    return p[j] == '.' or s[i] == p[j]

Java:

class Solution {
    public boolean isMatch(String s, String p) {
        int m = s.length();
        int n = p.length();
        
        boolean[][] dp = new boolean[m + 1][n + 1];
        dp[0][0] = true;
        
        for (int j = 2; j <= n; j++) {
            if (p.charAt(j - 1) == '*') {
                dp[0][j] = dp[0][j - 2];
            }
        }
        
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (p.charAt(j - 1) == '*') {
                    dp[i][j] = dp[i][j - 2];
                    if (matches(s, p, i - 1, j - 2)) {
                        dp[i][j] = dp[i][j] || dp[i - 1][j];
                    }
                } else {
                    if (matches(s, p, i - 1, j - 1)) {
                        dp[i][j] = dp[i - 1][j - 1];
                    }
                }
            }
        }
        
        return dp[m][n];
    }
    
    private boolean matches(String s, String p, int i, int j) {
        if (j < 0) return false;
        return p.charAt(j) == '.' || s.charAt(i) == p.charAt(j);
    }
}

Complexity Analysis:

  • Time Complexity: O(m * n) - Fill the DP table once
  • Space Complexity: O(m * n) - DP table storage

Key Insights

  1. Dynamic Programming: Break down the problem into smaller subproblems
  2. Pattern Matching: Handle ‘.’ and ‘*’ characters correctly
  3. State Transitions: Define clear transitions between states
  4. Base Cases: Handle empty string and pattern cases

Edge Cases

  1. Empty String: Handle empty string matching
  2. Empty Pattern: Handle empty pattern
  3. Only ‘*’: Pattern with only ‘*’ characters
  4. Multiple ‘*’: Patterns with multiple ‘*’ characters
  5. No Match: Cases where no match is possible

Test Cases

def test_isMatch():
    assert isMatch("aa", "a") == False
    assert isMatch("aa", "a*") == True
    assert isMatch("ab", ".*") == True
    assert isMatch("aab", "c*a*b") == True
    assert isMatch("mississippi", "mis*is*p*.") == False
    print("All tests passed!")

test_isMatch()

Follow-up Questions

  1. Wildcard Matching: Handle ‘?’ and ‘*’ wildcards?
  2. Multiple Patterns: Match against multiple patterns?
  3. Pattern Validation: Validate if pattern is well-formed?
  4. Performance Optimization: Optimize for specific patterns?
  5. Unicode Support: Handle Unicode characters?

Common Mistakes

  1. Incorrect ‘*’ Handling: Not handling zero or multiple matches correctly
  2. Index Errors: Off-by-one errors in string indexing
  3. Base Case Missing: Not handling empty string/pattern cases
  4. State Transitions: Incorrect DP state transitions
  5. Memoization: Not properly implementing memoization

Interview Tips

  1. Start with Recursion: Begin with recursive approach
  2. Add Memoization: Optimize with memoization
  3. Convert to DP: Show bottom-up DP solution
  4. Handle Edge Cases: Cover empty strings and special patterns
  5. Explain Complexity: Discuss time and space complexity

Concept Explanations

Dynamic Programming: A technique to solve problems by breaking them into smaller subproblems and storing results to avoid redundant calculations.

Regular Expression: A sequence of characters that defines a search pattern, commonly used for pattern matching in strings.

Pattern Matching: The process of finding patterns within strings, which is fundamental to many text processing applications.

When to Use: Use this pattern when you need to match strings against patterns with special characters, especially when the pattern can match variable-length sequences.