https://leetcode.com/problems/wildcard-matching/
Problem
Implement wildcard pattern matching with support for '?' and '*'.
'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).
The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)
Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false
Solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public class Solution { | |
// Stores failed matches for given pattern and string. Used for memoization. | |
private Map<Integer, Set<String>> map; | |
public Solution() { | |
map = new HashMap<Integer, Set<String>>(); | |
} | |
public boolean isMatch(String s, String p) { | |
return isMatch2(s, cleanPattern(p)); | |
} | |
public boolean isMatch2(String s, String p) { | |
int sPtr = 0; | |
int pPtr = 0; | |
char sChar = ' '; | |
char pChar = ' '; | |
while (sPtr < s.length() && pPtr < p.length()) { | |
sChar = s.charAt(sPtr); | |
pChar = p.charAt(pPtr); | |
if (pChar == sChar || pChar == '?') { | |
// Continue | |
sPtr += 1; | |
pPtr += 1; | |
} | |
else if (pChar == '*') { | |
// Check if a previous attempt to match string and pattern has already failed. | |
Set<String> failed = map.get(pPtr); | |
// If it doesn't match, we don't need to go down the same rabbit hole and waste computation cycles. | |
if (failed == null || !failed.contains(s.substring(sPtr))) { | |
boolean isMatch = isMatch(s.substring(sPtr), p.substring(pPtr + 1)); | |
if (isMatch) { | |
return true; | |
} | |
if (failed == null) { | |
failed = new HashSet<String>(); | |
} | |
} | |
// Record failed match so we can look it up later. | |
failed.add(s.substring(sPtr)); | |
map.put(pPtr, failed); | |
sPtr += 1; | |
} | |
else { | |
return false; | |
} | |
} | |
// Consume all wildcards | |
while (pPtr < p.length() && p.charAt(pPtr) == '*') { | |
pPtr++; | |
} | |
if (sPtr == s.length() && pPtr == p.length()) { | |
// All characters consumed | |
return true; | |
} | |
if (sPtr < s.length() && pPtr == p.length()) { | |
// Remaining characters in s | |
return false; | |
} | |
if (sPtr == s.length() && pPtr < p.length()) { | |
// Remaining characters in p | |
return false; | |
} | |
return false; | |
} | |
// Remove contiguous wildcard characters from string. | |
// For example, passing in "aa**bb" will return "aa*bb" | |
private String cleanPattern(String s) { | |
StringBuilder copy = new StringBuilder(); | |
char lastChar = ' '; | |
for (int i = 0; i < s.length(); i++) { | |
if (s.charAt(i) == '*' && lastChar == '*') { | |
continue; | |
} | |
lastChar = s.charAt(i); | |
copy.append(lastChar); | |
} | |
return copy.toString(); | |
} | |
} |
#1
The general approach is to loop through each character in s and p and compare them. Break out of the loop once the end is reached for either s or p.
- If the characters match, then move on to the next set of characters.
- If the pattern character is '?', then move on to the next set of characters.
- If the pattern character is a wildcard '*', then call the function recursively with the wildcard character removed. If the recursive call returns true, then the candidate string and pattern matches. If the recursive call returns false, then move on to the next character in the candidate string.
- If the characters don't match, then return false.
Finally, return true if the end has been reached for both s and p. Otherwise, return false if the end has not been reached for either s or p.
#2
The first approach works, but has an incredibly inefficient runtime complexity of O(2^n). This is caused by the fact that whenever a wildcard is encountered, it performs a recursive call and possibly recomputes the same exact s and p multiple times. Due to the O(2^n) time complexity, LeetCode threw a 'Time limit exceeded" error.
This problem is analogous to the fibonacci problem where the same computation is performed many times. |
To solve the issue of duplicate computations, we can use a technique called memoization. By creating a lookup table to store failed matches, we can avoid going down the same rabbit hole by checking if s and p has already failed to match.
#3
The second approach significantly improves runtime complexity, but LeetCode still threw a 'Time limit exceeded" error on the inputs:
s = "aaaaabbbbbababbabbbbaaabbaabaaabaabbbbabaaaabaaabbbbbbabbaaaaabbbaaababbaaabaabbaabbbbaaabaabbbaabbbbabbabaabbabbbbaabbaabbabaababababbaabbabbbaaabaaababaaaabbaaaaabbbaaaabbbababbabaaaabbbaabaabbaaaabaaab"
p = "*aab***aaab**a*b**b*aa***baabababb*abbbbb**a**ba******b**b*a*ab*b***aa*****a*aba*a***aa*b****a*a*****"
Upon closer inspection, we can infer that the contiguous wildcards was causing the algorithm to branch out wildly. To solve this issue, we can remove contiguous wildcards from p so it looks like
p = "*aab*aaab*a*b*b*aa*baabababb*abbbbb*a*ba*b*b*a*ab*b*aa*a*aba*a*aa*b*a*a*"
Conclusion
By combining the three approaches above, LeetCode accepted my submission.
This was one of the more challenging problems I have encountered, and justifiably so because the solution acceptance rate is 15.8% (13th lowest out of 273 problems on the site) and its difficulty is rated 'Hard'.
The real world application of the wildcard matching problem includes anything that uses text search, some of which are: Google search, Microsoft Word, text editors, etc.