Need regex to filter out .ru and other spammy email address
Categories:
Filtering Spam Email Addresses with PHP and Regular Expressions

Learn how to effectively identify and filter out spammy email addresses, particularly those from suspicious top-level domains like .ru, using PHP and regular expressions.
In today's digital landscape, managing email addresses is crucial for any application. Unfortunately, this often involves dealing with a significant amount of spam. One common characteristic of spam is its origin from certain top-level domains (TLDs) that are frequently abused. This article will guide you through using PHP and regular expressions to create a robust filter for identifying and blocking email addresses from such domains, focusing on .ru
as a primary example.
Understanding the Problem: Spam TLDs
Spammers often register domains in TLDs that have lax registration policies or are less frequently monitored. While .ru
(Russia) is a common target for filtering due to historical spam trends, other TLDs like .xyz
, .top
, .bid
, or even newly emerging ones can also be sources of unwanted emails. The goal is not to block entire countries, but to mitigate known spam vectors. A flexible regex approach allows you to maintain a blacklist of TLDs that are problematic for your specific application.
flowchart TD A[Email Address Input] --> B{Extract TLD?} B -- Yes --> C[Compare TLD to Blacklist] C -- Match --> D[Mark as Spam] C -- No Match --> E[Mark as Valid] B -- No --> E
Flowchart for Email Address TLD Filtering
Crafting the Regular Expression
The core of our filtering mechanism will be a regular expression. We need a pattern that can reliably extract the TLD from an email address and then check if it matches any of our blacklisted domains. A simple approach is to look for the last dot .
followed by a sequence of alphanumeric characters at the end of the string. We'll then use PHP's preg_match
or preg_replace
functions to apply this logic.
<?php
function isSpamEmail(string $email, array $blacklistTlds): bool
{
// Basic email format validation (optional, but recommended)
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
return true; // Or handle as invalid format
}
// Regex to extract the TLD
// Matches a dot followed by 2-6 alphanumeric characters at the end of the string
// The TLD itself is captured in group 1
if (preg_match('/\.(?<tld>[a-zA-Z]{2,6})$/', $email, $matches)) {
$tld = strtolower($matches['tld']);
return in_array($tld, $blacklistTlds);
}
return false; // No TLD found or not a blacklisted TLD
}
$blacklist = ['ru', 'xyz', 'top', 'bid'];
// Test cases
echo 'test@example.com: ' . (isSpamEmail('test@example.com', $blacklist) ? 'SPAM' : 'VALID') . "\n";
echo 'spam@bad.ru: ' . (isSpamEmail('spam@bad.ru', $blacklist) ? 'SPAM' : 'VALID') . "\n";
echo 'user@domain.xyz: ' . (isSpamEmail('user@domain.xyz', $blacklist) ? 'SPAM' : 'VALID') . "\n";
echo 'valid@another.net: ' . (isSpamEmail('valid@another.net', $blacklist) ? 'SPAM' : 'VALID') . "\n";
echo 'invalid-email: ' . (isSpamEmail('invalid-email', $blacklist) ? 'SPAM' : 'VALID') . "\n";
?>
PHP function to check for spam email addresses based on TLD blacklist.
filter_var($email, FILTER_VALIDATE_EMAIL)
function provides a robust initial check for email format validity, which is highly recommended before applying custom regex for TLD filtering. This prevents malformed emails from potentially bypassing your TLD check.Maintaining Your TLD Blacklist
The effectiveness of this filter heavily relies on your blacklistTlds
array. This list should be dynamic and regularly updated based on the spam patterns you observe. Consider storing this list in a configuration file, database, or even fetching it from a trusted external source if your application requires frequent updates. Avoid hardcoding it directly into your application logic for easier maintenance.
<?php
// Example of loading blacklist from a configuration file (e.g., config.php)
// config.php might contain: return ['ru', 'xyz', 'top'];
$blacklistTlds = require 'config/spam_tlds.php';
// Or from a database:
// $stmt = $pdo->query('SELECT tld FROM spam_tlds_blacklist');
// $blacklistTlds = $stmt->fetchAll(PDO::FETCH_COLUMN);
// Now use $blacklistTlds with your isSpamEmail function
$emailToCheck = 'another_spam@example.ru';
if (isSpamEmail($emailToCheck, $blacklistTlds)) {
echo "'{$emailToCheck}' is a spam email.\n";
} else {
echo "'{$emailToCheck}' is a valid email.\n";
}
?>
Demonstrates how to load the TLD blacklist from an external source.