Regular Expression Matching

Implement regular expression matching with '.' and '*'

Language Selection

Choose your preferred programming language

Showing: Python

Problem Statement

Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*' where:

  • '.' Matches any single character
  • '*' Matches zero or more of the preceding element

The matching should cover the entire input string (not partial).

Input:

  • A string s (the input string)
  • A string p (the pattern)

Output:

  • Boolean indicating if the entire string matches the pattern

Constraints:

  • 1 ≤ s.length ≤ 20
  • 1 ≤ p.length ≤ 30
  • s contains only lowercase English letters
  • p contains only lowercase English letters, ‘.’, and ‘*’
  • It is guaranteed for each appearance of ‘*’, there will be a previous valid character to match

Examples:

Example 1:

Input: s = "aa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. 
Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Example 4:

Input: s = "aab", p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 2 times. Therefore, it matches "aab".

Solutions

Approach 1: 2D Dynamic Programming

Algorithm:

  1. Create 2D DP table where dp[i][j] = true if s[0:i] matches p[0:j]
  2. Base case: dp[0][0] = true (empty string matches empty pattern)
  3. Handle patterns with ‘*’ that can match zero characters
  4. For each cell, check character match and handle ‘*’ cases

Python:

def isMatch(s, p):
    """
    2D DP solution for regex matching
    Time: O(m * n) where m = len(s), n = len(p)
    Space: O(m * n) for DP table
    """
    m, n = len(s), len(p)
    
    # dp[i][j] = True if s[0:i] matches p[0:j]
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    
    # Base case: empty string matches empty pattern
    dp[0][0] = True
    
    # Handle patterns like "a*", "a*b*", etc. that can match empty string
    for j in range(2, n + 1):\n        # Pattern like \"a*\" can match empty string\n        if p[j-1] == '*':\n            dp[0][j] = dp[0][j-2]\n    \n    # Fill DP table\n    for i in range(1, m + 1):\n        for j in range(1, n + 1):\n            if p[j-1] == '*':\n                # '*' can match zero or more of preceding character\n                # Case 1: Use * to match zero occurrences (ignore \"x*\")\n                dp[i][j] = dp[i][j-2]\n                \n                # Case 2: Use * to match one or more occurrences\n                # Check if preceding character matches current character in s\n                if matches(s, p, i, j-1):  # s[i-1] matches p[j-2]\n                    dp[i][j] = dp[i][j] or dp[i-1][j]\n            else:\n                # Regular character or '.'\n                if matches(s, p, i, j):\n                    dp[i][j] = dp[i-1][j-1]\n    \n    return dp[m][n]\n\ndef matches(s, p, i, j):\n    \"\"\"Check if s[i-1] matches p[j-1]\"\"\"\n    return p[j-1] == '.' or s[i-1] == p[j-1]\n\ndef isMatch_verbose(s, p):\n    \"\"\"\n    Verbose version with detailed comments\n    Time: O(m * n)\n    Space: O(m * n)\n    \"\"\"\n    m, n = len(s), len(p)\n    dp = [[False] * (n + 1) for _ in range(m + 1)]\n    \n    # Base case\n    dp[0][0] = True\n    print(f\"Base case: dp[0][0] = {dp[0][0]}\")\n    \n    # Handle patterns that can match empty string\n    for j in range(2, n + 1):\n        if p[j-1] == '*':\n            dp[0][j] = dp[0][j-2]\n            print(f\"Empty string vs '{p[:j]}': dp[0][{j}] = {dp[0][j]}\")\n    \n    # Fill table\n    for i in range(1, m + 1):\n        for j in range(1, n + 1):\n            char_s = s[i-1]\n            char_p = p[j-1]\n            \n            if char_p == '*':\n                # Zero occurrences\n                dp[i][j] = dp[i][j-2]\n                \n                # One or more occurrences\n                if p[j-2] == '.' or s[i-1] == p[j-2]:\n                    dp[i][j] = dp[i][j] or dp[i-1][j]\n                \n                print(f\"s[:{i}]='{s[:i]}' vs p[:{j}]='{p[:j]}' (*): dp[{i}][{j}] = {dp[i][j]}\")\n            else:\n                # Regular character or '.'\n                if char_p == '.' or char_s == char_p:\n                    dp[i][j] = dp[i-1][j-1]\n                \n                print(f\"s[:{i}]='{s[:i]}' vs p[:{j}]='{p[:j]}' ({char_p}): dp[{i}][{j}] = {dp[i][j]}\")\n    \n    return dp[m][n]\n\n# Example usage\nprint(isMatch(\"aa\", \"a*\"))       # Output: True\nprint(isMatch(\"ab\", \".*\"))       # Output: True\nprint(isMatch(\"aab\", \"c*a*b\"))   # Output: True\n```\n\n**Java:**\n```java\nclass Solution {\n    /**\n     * 2D DP solution for regex matching\n     * Time: O(m * n)\n     * Space: O(m * n)\n     */\n    public boolean isMatch(String s, String p) {\n        int m = s.length();\n        int n = p.length();\n        \n        // dp[i][j] = true if s[0:i] matches p[0:j]\n        boolean[][] dp = new boolean[m + 1][n + 1];\n        \n        // Base case\n        dp[0][0] = true;\n        \n        // Handle patterns like \"a*\", \"a*b*\" that can match empty string\n        for (int j = 2; j <= n; j++) {\n            if (p.charAt(j - 1) == '*') {\n                dp[0][j] = dp[0][j - 2];\n            }\n        }\n        \n        // Fill DP table\n        for (int i = 1; i <= m; i++) {\n            for (int j = 1; j <= n; j++) {\n                if (p.charAt(j - 1) == '*') {\n                    // '*' can match zero or more of preceding character\n                    // Case 1: zero occurrences\n                    dp[i][j] = dp[i][j - 2];\n                    \n                    // Case 2: one or more occurrences\n                    if (matches(s, p, i, j - 1)) {\n                        dp[i][j] = dp[i][j] || dp[i - 1][j];\n                    }\n                } else {\n                    // Regular character or '.'\n                    if (matches(s, p, i, j)) {\n                        dp[i][j] = dp[i - 1][j - 1];\n                    }\n                }\n            }\n        }\n        \n        return dp[m][n];\n    }\n    \n    private boolean matches(String s, String p, int i, int j) {\n        return p.charAt(j - 1) == '.' || s.charAt(i - 1) == p.charAt(j - 1);\n    }\n}\n```\n\n### Approach 2: Recursion with Memoization\n\n**Python:**\n```python\ndef isMatch_recursive(s, p):\n    \"\"\"\n    Recursive solution with memoization\n    Time: O(m * n)\n    Space: O(m * n) for memoization + recursion stack\n    \"\"\"\n    memo = {}\n    \n    def dp(i, j):\n        # Base cases\n        if (i, j) in memo:\n            return memo[(i, j)]\n        \n        # If pattern is exhausted\n        if j == len(p):\n            return i == len(s)\n        \n        # Check if current characters match\n        first_match = i < len(s) and (p[j] == '.' or s[i] == p[j])\n        \n        # Handle '*' in pattern\n        if j + 1 < len(p) and p[j + 1] == '*':\n            # Two choices:\n            # 1. Don't use the \"x*\" pattern (match zero occurrences)\n            # 2. Use the \"x*\" pattern if first character matches\n            result = (dp(i, j + 2) or \n                     (first_match and dp(i + 1, j)))\n        else:\n            # No '*', must match current character\n            result = first_match and dp(i + 1, j + 1)\n        \n        memo[(i, j)] = result\n        return result\n    \n    return dp(0, 0)\n\ndef isMatch_recursive_verbose(s, p):\n    \"\"\"\n    Verbose recursive solution\n    Time: O(m * n)\n    Space: O(m * n)\n    \"\"\"\n    memo = {}\n    \n    def dp(i, j, depth=0):\n        indent = \"  \" * depth\n        print(f\"{indent}Checking s[{i}:] = '{s[i:]}' vs p[{j}:] = '{p[j:]}'\")\n        \n        if (i, j) in memo:\n            print(f\"{indent}Found in memo: {memo[(i, j)]}\")\n            return memo[(i, j)]\n        \n        # Pattern exhausted\n        if j == len(p):\n            result = i == len(s)\n            print(f\"{indent}Pattern exhausted, string exhausted: {result}\")\n            memo[(i, j)] = result\n            return result\n        \n        # Check current character match\n        first_match = i < len(s) and (p[j] == '.' or s[i] == p[j])\n        print(f\"{indent}First match: {first_match}\")\n        \n        if j + 1 < len(p) and p[j + 1] == '*':\n            print(f\"{indent}Found '*' pattern: {p[j]}*\")\n            # Zero occurrences\n            zero_match = dp(i, j + 2, depth + 1)\n            print(f\"{indent}Zero occurrences: {zero_match}\")\n            \n            # One or more occurrences\n            if first_match:\n                one_or_more = dp(i + 1, j, depth + 1)\n                print(f\"{indent}One or more occurrences: {one_or_more}\")\n                result = zero_match or one_or_more\n            else:\n                result = zero_match\n        else:\n            print(f\"{indent}Regular character match\")\n            result = first_match and dp(i + 1, j + 1, depth + 1)\n        \n        print(f\"{indent}Result: {result}\")\n        memo[(i, j)] = result\n        return result\n    \n    return dp(0, 0)\n\n# Example usage\nprint(isMatch_recursive(\"aab\", \"c*a*b\"))  # Output: True\n```\n\n## Key Insights\n\n1. **Two Key Cases**: Handle regular characters/dots and '*' patterns differently\n2. **'*' Flexibility**: '*' can match zero or more of the preceding character\n3. **Greedy vs Non-greedy**: '*' requires checking both zero and multiple matches\n4. **Base Cases**: Empty pattern can only match empty string (with exceptions)\n5. **State Definition**: dp[i][j] represents if s[0:i] matches p[0:j]\n\n## Edge Cases\n\n1. **Empty strings**: Both empty should return True\n2. **Pattern with '*'**: \"a*\" can match empty string\n3. **Only '.'**: Single '.' matches any single character\n4. **Only '*'**: Invalid pattern (guaranteed not to happen)\n5. **Multiple '*'**: \"a*b*\" can match various combinations\n\n## Test Cases\n\n```python\ndef test_regex_matching():\n    test_cases = [\n        (\"aa\", \"a\", False),\n        (\"aa\", \"a*\", True),\n        (\"ab\", \".*\", True),\n        (\"aab\", \"c*a*b\", True),\n        (\"mississippi\", \"mis*is*p*.\", False),\n        (\"\", \".*\", True),\n        (\"\", \"a*\", True),\n        (\"a\", \"ab*\", True),\n        (\"abc\", \"a.*c\", True),\n        (\"abcdef\", \"a.*f\", True)\n    ]\n    \n    for s, p, expected in test_cases:\n        result = isMatch(s, p)\n        assert result == expected, f\"s='{s}', p='{p}', expected={expected}, got={result}\"\n        print(f\"isMatch('{s}', '{p}') = {result} ✓\")\n\ntest_regex_matching()\n```\n\n## Common Mistakes\n\n1. **Wrong '*' handling**: Not considering both zero and multiple matches\n2. **Index errors**: Off-by-one errors when accessing string characters\n3. **Base case confusion**: Not handling empty string/pattern correctly\n4. **Greedy matching**: Assuming '*' should always match as much as possible\n5. **'.' misunderstanding**: Forgetting that '.' matches any single character\n\n## Follow-up Questions\n\n1. **Wildcard matching**: Different pattern with '?' and '*'\n2. **Multiple patterns**: Match against multiple regex patterns\n3. **Case sensitivity**: Handle case-insensitive matching\n4. **Extended regex**: Support for '+', '?', character classes\n5. **Pattern validation**: Check if the pattern itself is valid\n6. **Space optimization**: Reduce space complexity of DP solution\n\n## Interview Tips\n\n1. **Understand the problem**: Clarify what '.' and '*' mean exactly\n2. **Start with examples**: Work through examples to understand patterns\n3. **Break down cases**: Handle '*' and regular characters separately\n4. **Draw DP table**: Visualize the state transitions\n5. **Test edge cases**: Empty strings, patterns with multiple '*'\n6. **Explain both approaches**: Both recursive and iterative DP\n7. **Discuss complexity**: Time and space complexity analysis