Regular expression to match string starting with a specific word
Categories:
Mastering Regular Expressions: Matching Strings Starting with a Specific Word
Learn how to construct regular expressions to accurately identify strings that begin with a predefined word or phrase, covering various scenarios and regex engines.
Regular expressions (regex) are powerful tools for pattern matching in strings. A common task is to find strings that start with a particular word or sequence of characters. This article will guide you through the fundamental regex constructs to achieve this, providing examples and explaining the nuances across different regex flavors.
The Basics: Anchoring to the Start of a String
The most crucial element when matching a string's beginning is the ^
(caret) anchor. This special character asserts that the match must occur at the very start of the string. Without it, your regex might match the word anywhere within the string, which is not what we want for a 'starts with' condition.
^word
Basic regex to match strings starting with 'word'.
Let's break this down:
^
: This is the start-of-string anchor. It ensures that whatever follows it must be at the beginning of the input string.word
: This is the literal string you want to match. The regex engine will look for this exact sequence of characters immediately after the start of the string.
/^word/i
in JavaScript, re.IGNORECASE
in Python) or specify alternatives like ^[Ww]ord
.Matching a Word Followed by Anything
Often, you don't just want to match the starting word, but also the rest of the string that follows it. For this, you can use the .
(dot) and *
(asterisk) quantifiers.
^word.*
Regex to match strings starting with 'word' and followed by any characters.
Explanation:
^word
: As before, this matches 'word' at the beginning of the string..
: This matches any single character (except newline characters, by default).*
: This is a quantifier that means 'zero or more' of the preceding element. So,.*
means 'zero or more of any character'.
.
character typically does not match newline characters (\n
). If you need .
to match newlines as well, you'll often use a 'dotall' or 'singleline' flag (e.g., re.DOTALL
in Python, s
flag in Perl/PCRE/JavaScript).flowchart TD A[Start of String] --> B{"Is 'word' present?"} B -- Yes --> C[Match 'word'] C --> D{"Are there more characters?"} D -- Yes --> E[Match any character ('.')] E --> D D -- No --> F[End of Match] B -- No --> G[No Match]
Flowchart of ^word.*
regex matching logic.
Handling Word Boundaries and Spaces
When matching a 'word', you might want to ensure it's a whole word and not just a prefix of another word (e.g., matching 'cat' but not 'catalog'). The \b
(word boundary) assertion is perfect for this. Also, consider if the word should be followed by a space or other specific characters.
^word\b.*
Regex to match strings starting with the whole word 'word'.
Here, \b
ensures that 'word' is followed by a non-word character (like a space, punctuation, or the end of the string). This prevents matching 'wordy' or 'wordplay'.
^word\s.*
Regex to match strings starting with 'word' followed by a space.
In this case, \s
matches any whitespace character (space, tab, newline, etc.). This is useful if you specifically expect the word to be followed by a space.
Python
import re
text1 = "apple pie is delicious" text2 = "application development" text3 = "Apple juice"
Case-sensitive, whole word
pattern1 = r"^apple\b.*" print(f"'{text1}' matches pattern1: {bool(re.match(pattern1, text1))}") # True print(f"'{text2}' matches pattern1: {bool(re.match(pattern1, text2))}") # False
Case-insensitive, whole word
pattern2 = r"^apple\b.*" print(f"'{text3}' matches pattern2 (case-insensitive): {bool(re.match(pattern2, text3, re.IGNORECASE))}") # True
JavaScript
const text1 = "apple pie is delicious"; const text2 = "application development"; const text3 = "Apple juice";
// Case-sensitive, whole word
const pattern1 = /^apple\b.*/;
console.log('${text1}' matches pattern1: ${pattern1.test(text1)}
); // true
console.log('${text2}' matches pattern1: ${pattern1.test(text2)}
); // false
// Case-insensitive, whole word
const pattern2 = /^apple\b.*/i;
console.log('${text3}' matches pattern2 (case-insensitive): ${pattern2.test(text3)}
); // true