Wildcard string matching
Categories:
Mastering Wildcard String Matching: Algorithms and Implementations

Explore the fundamental algorithms and practical implementations for wildcard string matching, covering classic approaches and their complexities.
Wildcard string matching is a common problem in computer science, essential for tasks like file searching, text processing, and regular expression engines. It involves determining if a given text string matches a pattern containing special 'wildcard' characters. The most common wildcards are *
(which matches zero or more characters) and ?
(which matches exactly one character). This article delves into the algorithms used to solve this problem, focusing on their logic, complexity, and practical application.
Understanding the Problem: Wildcards Explained
Before diving into algorithms, it's crucial to understand the behavior of the two primary wildcard characters:
?
(Question Mark): This wildcard matches any single character. For example, the patterna?c
would matchabc
,adc
, but notac
orabcc
.*
(Asterisk): This wildcard matches zero or more occurrences of any character. This is where the complexity often arises. For instance, the patterna*c
would matchac
,abc
,abcc
,axbyc
, and so on.
The challenge lies in efficiently handling the *
wildcard, as it can represent a variable number of characters, leading to multiple potential matching paths.
flowchart TD A[Start Matching] --> B{Current Pattern Char?} B --> |'?'| C[Match Any Single Char] B --> |'*'| D[Match Zero or More Chars] B --> |Literal Char| E{Text Char Matches?} C --> F[Advance Both Ptr] E --> |Yes| F E --> |No| G[Mismatch] D --> H[Try Matching Zero Chars] D --> I[Try Matching One or More Chars] H --> F I --> F F --> B G --> J[End Mismatch] F --> K{End of Pattern & Text?} K --> |Yes| L[Match Found] K --> |No| G
Basic Logic Flow for Wildcard Matching
Recursive Approach with Backtracking
A natural way to think about wildcard matching is through recursion with backtracking. The core idea is to consider different possibilities when a wildcard character is encountered. This approach can be quite intuitive but may suffer from performance issues due to redundant computations if not optimized.
Let's define a function isMatch(text, pattern)
that returns true
if text
matches pattern
.
Base Cases:
- If both
text
andpattern
are exhausted, it's a match. - If
pattern
is exhausted buttext
is not, it's a mismatch (unless the remainingtext
can be consumed by a preceding*
). - If
text
is exhausted butpattern
is not, it's a mismatch (unless the remainingpattern
consists only of*
characters).
Recursive Steps:
- If
pattern[j]
is?
orpattern[j]
matchestext[i]
: Advance bothi
andj
. - If
pattern[j]
is*
: This is the tricky part. The*
can match zero characters, or one character, or two, and so on. We explore two possibilities:*
matches zero characters: Try matchingtext[i]
withpattern[j+1]
(effectively skipping*
).*
matches one or more characters: Try matchingtext[i+1]
withpattern[j]
(effectively consumingtext[i]
with*
and keeping*
active for subsequent characters).
This recursive solution can be optimized using memoization (dynamic programming) to store results of subproblems and avoid recomputing them.
def isMatchRecursive(text, pattern):
memo = {}
def dp(i, j):
if (i, j) in memo:
return memo[(i, j)]
# Base cases
if j == len(pattern):
return i == len(text)
if i == len(text):
# If text is exhausted, pattern must be all '*' to match
for k in range(j, len(pattern)):
if pattern[k] != '*':
return False
return True
# Recursive steps
if pattern[j] == '?' or pattern[j] == text[i]:
result = dp(i + 1, j + 1)
elif pattern[j] == '*':
# '*' matches zero characters (skip '*') OR
# '*' matches one or more characters (consume text[i] with '*' and keep '*' active)
result = dp(i, j + 1) or dp(i + 1, j)
else:
result = False
memo[(i, j)] = result
return result
return dp(0, 0)
# Example Usage:
# print(isMatchRecursive("adceb", "*a*b")) # True
# print(isMatchRecursive("acdcb", "a*c?b")) # False
# print(isMatchRecursive("aa", "a")) # False
Python implementation of recursive wildcard matching with memoization.
Iterative Dynamic Programming Approach
An iterative dynamic programming (DP) approach provides a more structured way to solve the wildcard matching problem, often with better space complexity than a purely recursive solution (if memoization is not carefully managed). We typically use a 2D boolean array, dp[i][j]
, where dp[i][j]
is true
if the first i
characters of the text match the first j
characters of the pattern.
Initialization:
dp[0][0] = true
(empty text matches empty pattern).dp[0][j] = true
ifpattern[0...j-1]
consists only of*
characters (an empty string can be matched by*
,**
, etc.).
Transitions:
For dp[i][j]
:
- If
pattern[j-1]
is?
orpattern[j-1]
matchestext[i-1]
:dp[i][j] = dp[i-1][j-1]
(match the current characters and rely on the previous subproblem). - If
pattern[j-1]
is*
:dp[i][j]
can betrue
if either:*
matches zero characters:dp[i][j-1]
(effectively skipping*
in the pattern).*
matches one or more characters:dp[i-1][j]
(effectively consumingtext[i-1]
with*
and keeping*
active). So,dp[i][j] = dp[i][j-1] || dp[i-1][j]
.
- Otherwise (literal mismatch):
dp[i][j] = false
.
class Solution {
public boolean isMatch(String s, String p) {
int m = s.length();
int n = p.length();
boolean[][] dp = new boolean[m + 1][n + 1];
// Base case: empty string matches empty pattern
dp[0][0] = true;
// Initialize for patterns starting with '*' (matching empty string)
for (int j = 1; j <= n; j++) {
if (p.charAt(j - 1) == '*') {
dp[0][j] = dp[0][j - 1];
}
}
// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (p.charAt(j - 1) == '?' || p.charAt(j - 1) == s.charAt(i - 1)) {
dp[i][j] = dp[i - 1][j - 1];
} else if (p.charAt(j - 1) == '*') {
// '*' matches zero characters (dp[i][j-1])
// OR '*' matches one or more characters (dp[i-1][j])
dp[i][j] = dp[i][j - 1] || dp[i - 1][j];
} else {
dp[i][j] = false;
}
}
}
return dp[m][n];
}
}
// Example Usage:
// Solution sol = new Solution();
// System.out.println(sol.isMatch("adceb", "*a*b")); // True
// System.out.println(sol.isMatch("acdcb", "a*c?b")); // False
// System.out.println(sol.isMatch("aa", "a")); // False
Java implementation of iterative dynamic programming for wildcard matching.
*
. Correct initialization of the DP table is crucial for accurate results.Greedy Approach (Two Pointers)
For certain types of wildcard matching problems (specifically, when *
can match any sequence of characters, but not when it's tied to a specific character like in regular expressions), a greedy approach using two pointers can be more efficient. This method avoids the overhead of DP table construction but requires careful handling of backtracking for *
.
We maintain pointers for the current position in text
(i
) and pattern
(j
). We also need to store the last encountered *
position (star_idx
) and the corresponding text
position (match_idx
) where *
started matching.
Algorithm Steps:
- Iterate through
text
with pointeri
. - If
pattern[j]
is?
orpattern[j]
matchestext[i]
: Advance bothi
andj
. - If
pattern[j]
is*
: Storestar_idx = j
andmatch_idx = i
. Advancej
(assume*
matches zero characters for now). - If
pattern[j]
does not matchtext[i]
and a*
was previously encountered (star_idx != -1
):- Backtrack: Reset
j
tostar_idx + 1
. - Advance
match_idx
by one (meaning the*
now consumes one more character fromtext
). - Reset
i
tomatch_idx
.
- Backtrack: Reset
- If no
*
was encountered and there's a mismatch: Returnfalse
. - After iterating through
text
, check if the remainingpattern
consists only of*
characters. If so, it's a match; otherwise, it's a mismatch.
function isMatchGreedy(s, p) {
let sPtr = 0;
let pPtr = 0;
let starIdx = -1;
let matchIdx = 0;
while (sPtr < s.length) {
// Case 1: Current characters match or pattern has '?'
if (pPtr < p.length && (p.charAt(pPtr) === '?' || p.charAt(pPtr) === s.charAt(sPtr))) {
sPtr++;
pPtr++;
}
// Case 2: Pattern has '*'
else if (pPtr < p.length && p.charAt(pPtr) === '*') {
starIdx = pPtr; // Store '*' position
matchIdx = sPtr; // Store text position where '*' started matching
pPtr++; // Advance pattern pointer, assuming '*' matches 0 chars for now
}
// Case 3: Mismatch, but a '*' was seen previously
else if (starIdx !== -1) {
pPtr = starIdx + 1; // Backtrack pattern pointer to after the last '*'
matchIdx++; // Advance text pointer, making '*' consume one more char
sPtr = matchIdx; // Reset text pointer to new matchIdx
}
// Case 4: Mismatch and no '*' to backtrack to
else {
return false;
}
}
// Consume any remaining '*' in the pattern
while (pPtr < p.length && p.charAt(pPtr) === '*') {
pPtr++;
}
return pPtr === p.length;
}
// Example Usage:
// console.log(isMatchGreedy("adceb", "*a*b")); // True
// console.log(isMatchGreedy("acdcb", "a*c?b")); // False
// console.log(isMatchGreedy("aa", "a")); // False
JavaScript implementation of the greedy (two-pointer) wildcard matching algorithm.