How can I write a regex which matches non greedy?
Categories:
Mastering Non-Greedy Regular Expressions

Learn how to write non-greedy (lazy) regular expressions to match the shortest possible string, avoiding common pitfalls of greedy matching.
Regular expressions are powerful tools for pattern matching in text, but their default 'greedy' behavior can sometimes lead to unexpected results. By default, quantifiers like *
, +
, ?
, and \{\}
attempt to match the longest possible string that satisfies the pattern. This article will guide you through understanding greedy vs. non-greedy matching and how to write non-greedy regex patterns to achieve precise matches.
Understanding Greedy vs. Non-Greedy Matching
When a quantifier in a regular expression is applied, it typically tries to match as much text as it possibly can while still allowing the overall pattern to succeed. This is known as 'greedy' matching. For example, if you have the string <a><b><c>
and you want to match the content inside the first set of angle brackets, a greedy pattern might consume more than intended.
Non-greedy (or 'lazy') matching, on the other hand, attempts to match the shortest possible string. This is often desired when parsing structured text where delimiters define distinct segments, and you want to match only up to the next delimiter, not the last one in the entire string.
flowchart TD A[Start Regex Engine] --> B{Quantifier Encountered?} B -->|Yes| C{Is it Greedy?} C -->|Yes| D[Match Longest Possible String] C -->|No (Lazy)| E[Match Shortest Possible String] D --> F{Does rest of pattern match?} E --> F F -->|Yes| G[Success: Return Match] F -->|No| H[Backtrack/Try Shorter (Greedy) or Longer (Lazy)] H --> F B -->|No| G
Flowchart illustrating the decision process for greedy vs. non-greedy quantifiers in a regex engine.
How to Make a Quantifier Non-Greedy
To make a greedy quantifier non-greedy, you simply append a question mark ?
immediately after it. This ?
acts as a modifier, changing the behavior of the preceding quantifier from 'match as much as possible' to 'match as little as possible'.
Here's a breakdown of common quantifiers and their non-greedy counterparts:
*
(zero or more) becomes*?
(zero or more, non-greedy)+
(one or more) becomes+?
(one or more, non-greedy)?
(zero or one) becomes??
(zero or one, non-greedy)\{n,\}
(n or more) becomes\{n,\}??
(n or more, non-greedy)\{n,m\}
(n to m) becomes\{n,m\}??
(n to m, non-greedy)
<.*>
Given the string <span>Hello</span><span>World</span>
, the greedy pattern <.*>
would match the entire string: <span>Hello</span><span>World</span>
. This is because .*
greedily consumes everything until the last >
it can find, while still allowing the final >
in the pattern to match.
<.*?>
Using the same string <span>Hello</span><span>World</span>
, the non-greedy pattern <.*?>
would match only the first <span>Hello</span>
. The .*?
now matches the shortest possible sequence of characters, stopping at the first >
that allows the pattern to complete.
Practical Examples in Different Languages
The concept of greedy and non-greedy quantifiers is universal across most regex engines, though the exact API for using regex varies by programming language.
Python
import re
text = 'HelloWorld'
Greedy match
greedy_pattern = r'<.*>' greedy_match = re.findall(greedy_pattern, text) print(f"Greedy match: {greedy_match}") # Output: ['HelloWorld']
Non-greedy match
non_greedy_pattern = r'<.*?>' non_greedy_match = re.findall(non_greedy_pattern, text) print(f"Non-greedy match: {non_greedy_match}") # Output: ['Hello', 'World']
JavaScript
const text = 'HelloWorld';
// Greedy match
const greedyPattern = /<.*>/g;
const greedyMatch = text.match(greedyPattern);
console.log(Greedy match: ${greedyMatch}
); // Output: HelloWorld
// Non-greedy match
const nonGreedyPattern = /<.*?>/g;
const nonGreedyMatch = text.match(nonGreedyPattern);
console.log(Non-greedy match: ${nonGreedyMatch}
); // Output: Hello,World
Java
import java.util.regex.Matcher; import java.util.regex.Pattern;
public class RegexNonGreedy { public static void main(String[] args) { String text = "HelloWorld";
// Greedy match
Pattern greedyPattern = Pattern.compile("<.*>");
Matcher greedyMatcher = greedyPattern.matcher(text);
while (greedyMatcher.find()) {
System.out.println("Greedy match: " + greedyMatcher.group(0)); // Output: <span>Hello</span><span>World</span>
}
// Non-greedy match
Pattern nonGreedyPattern = Pattern.compile("<.*?>");
Matcher nonGreedyMatcher = nonGreedyPattern.matcher(text);
while (nonGreedyMatcher.find()) {
System.out.println("Non-greedy match: " + nonGreedyMatcher.group(0)); // Output: <span>Hello</span>, <span>World</span>
}
}
}
.*?
is a common solution, be aware that it can still be inefficient if the pattern allows for many short matches before the correct one. Always test your regex with various inputs to ensure performance and correctness.