Using regexp in matlab
Categories:
Mastering Regular Expressions in MATLAB

Unlock the power of pattern matching in MATLAB using regular expressions for robust text processing and data extraction.
Regular expressions (regex) are a powerful tool for searching, manipulating, and validating text based on patterns. MATLAB provides a comprehensive set of functions to leverage regex, enabling users to perform complex string operations efficiently. This article will guide you through the fundamentals of using regular expressions in MATLAB, covering common functions, pattern syntax, and practical examples.
Understanding MATLAB's Regex Functions
MATLAB offers several built-in functions specifically designed for regular expression operations. The three primary functions you'll encounter are regexp
, regexprep
, and regexpi
. Each serves a distinct purpose in text processing workflows.
flowchart TD A[Input String] --> B{Choose Function} B --> C{regexp: Find Matches} B --> D{regexprep: Replace Matches} B --> E{regexpi: Find Matches (Case-Insensitive)} C --> F[Output: Start/End Indices, Tokens] D --> G[Output: Modified String] E --> H[Output: Start/End Indices, Tokens]
Overview of MATLAB's Regular Expression Functions
regexp
and regexpi
return indices and token information, while regexprep
returns the modified string. Choose the function that aligns with your desired output.Basic Pattern Matching with regexp
The regexp
function is used to find occurrences of a regular expression pattern within a string. It can return various types of output, including start and end indices of matches, matched substrings, and captured tokens. The basic syntax is [startIndex, endIndex, tokenExtents, matchStr, tokenStr] = regexp(str, expression, 'outputType')
.
text = 'The quick brown fox jumps over the lazy dog.';
pattern = 'fox';
[startIndex, endIndex] = regexp(text, pattern);
disp(['Pattern found at index: ', num2str(startIndex), ' to ', num2str(endIndex)]);
% Find all occurrences of words starting with 't'
pattern_all = '\<t\w*\>';
matches = regexp(text, pattern_all, 'match');
disp('Words starting with ''t'':');
disp(matches);
Using regexp
to find a specific word and words starting with 't'.
Replacing Text with regexprep
The regexprep
function allows you to replace all occurrences of a pattern in a string with a specified replacement string. This is incredibly useful for data cleaning, formatting, or anonymizing text. Its syntax is newStr = regexprep(str, expression, replace)
.
sentence = 'Email addresses: user1@example.com, user2@domain.org.';
% Replace email addresses with '[REDACTED]'
pattern_email = '\S+@\S+\.\S+';
redacted_sentence = regexprep(sentence, pattern_email, '[REDACTED]');
disp(redacted_sentence);
% Capitalize the first letter of each word
text_to_capitalize = 'hello world from matlab';
capitalized_text = regexprep(text_to_capitalize, '\<(\w)', '${upper($1)}');
disp(capitalized_text);
Examples of using regexprep
for redaction and capitalization.
regexprep
, you can use tokens from the matched pattern in your replacement string. For example, ${upper($1)}
refers to the first captured group, converted to uppercase.Case-Insensitive Matching with regexpi
For scenarios where the case of the characters doesn't matter, regexpi
provides case-insensitive pattern matching. It behaves similarly to regexp
but ignores case differences. The syntax is identical to regexp
.
data = 'Apple, banana, ORANGE, grape, apple.';
pattern = 'apple';
% Case-sensitive match (only 'Apple' and 'apple.')
matches_sensitive = regexp(data, pattern, 'match');
disp('Case-sensitive matches:');
disp(matches_sensitive);
% Case-insensitive match (all 'Apple', 'ORANGE', 'apple.')
matches_insensitive = regexpi(data, pattern, 'match');
disp('Case-insensitive matches:');
disp(matches_insensitive);
Comparing case-sensitive (regexp
) and case-insensitive (regexpi
) matching.