What characters are allowed in an email address?
Categories:
Understanding Valid Email Address Characters and Validation
Explore the complex rules governing email address characters, common validation pitfalls, and best practices for robust email handling in your applications.
Email addresses are fundamental to modern communication, yet their structure often leads to confusion, especially when it comes to validation. What characters are actually allowed? The answer is more nuanced than a simple regular expression. This article delves into the specifications defined by RFCs (Request for Comments), common misconceptions, and practical approaches to validating email addresses effectively.
The RFCs: Defining Email Address Structure
The definitive rules for email address syntax are primarily laid out in RFC 5322 (Internet Message Format) and RFC 5321 (Simple Mail Transfer Protocol). These documents describe an email address as having two main parts: a local-part
and a domain
, separated by an @
symbol. While the domain part is relatively straightforward (following DNS hostname rules), the local-part
is where most of the complexity lies.
flowchart LR A[Email Address] --> B[Local Part] A --> C[Domain Part] B --> D["Allowed Characters (RFC 5322)"] C --> E["Hostname Rules (RFC 1035, 1123)"] D --> F["Alphanumeric, '.', '!', '#', '$', '%', '&', ''', '*', '+', '-', '/', '=', '?', '^', '_', '`', '{', '|', '}', '~'"] D --> G["Quoted Strings (RFC 5322)"] G --> H["Any character (except CR/LF, backslash, double quote) when quoted"] E --> I["Alphanumeric, '-', '.'"] E --> J["No leading/trailing hyphen or dot"] E --> K["TLD must not be all-numeric"] style A fill:#f9f,stroke:#333,stroke-width:2px style B fill:#ccf,stroke:#333,stroke-width:2px style C fill:#ccf,stroke:#333,stroke-width:2px
High-level structure and character rules for email addresses based on RFCs.
Local Part: The Wild West of Characters
The local-part
(everything before the @
) is surprisingly permissive. According to RFC 5322, it can consist of one or more 'dot-atom' or a 'quoted-string'.
Dot-atom characters:
These include uppercase and lowercase English letters (A-Z, a-z), digits (0-9), and a selection of special characters: ! # $ % & ' * + - / = ? ^ _
{ | } ~. A dot (
.`) is also allowed, but it cannot be the first or last character, nor can two dots appear consecutively.
Quoted-string characters:
If the local-part
is enclosed in double quotes (e.g., "John Doe"@example.com
), almost any character is allowed, including spaces, provided they are properly escaped (e.g., \"
for a double quote within the string). This is rarely seen in practice but is technically valid.
Domain Part: Stricter Rules
The domain
part (everything after the @
) is much more restrictive, adhering to DNS hostname conventions. It consists of one or more 'labels' separated by dots. Each label must start and end with an alphanumeric character and can contain alphanumeric characters or hyphens (-
).
Key restrictions for the domain part:
- Labels cannot start or end with a hyphen.
- The total length of the domain name cannot exceed 255 characters.
- The top-level domain (TLD) cannot be all-numeric (e.g.,
example@123.123
is invalid, butexample@domain.123
is valid if123
is a valid TLD). - Internationalized Domain Names (IDNs) are supported through Punycode encoding.
^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
A commonly used, but still imperfect, regular expression for email validation. It's a balance between RFC compliance and practical usage.
Practical Email Validation Strategies
Given the complexity and the discrepancy between RFCs and real-world implementations, a multi-faceted approach to email validation is often best:
- Basic Regex Check: Use a robust, but not overly strict, regular expression to catch obvious syntax errors. The regex above is a good starting point.
- Server-Side Validation: Always re-validate on the server. Client-side validation is for user experience, not security.
- DNS Lookup (MX Record Check): Verify that the domain actually exists and has Mail Exchange (MX) records. This doesn't guarantee the address is deliverable but filters out many invalid domains.
- SMTP Verification (Carefully): Attempting to connect to the mail server and verifying the address (without sending an email) can be very effective but can also be slow, resource-intensive, and sometimes blocked by mail servers.
- Confirmation Email: The most reliable method is to send a confirmation email. If the user receives it and clicks a link, the address is valid and deliverable.
Avoid overly strict regex patterns that might reject valid email addresses used by some providers. Focus on deliverability rather than strict RFC compliance for the local-part
.
1. Implement Basic Regex Validation
Use a well-tested regular expression (like the one provided) to perform an initial client-side and server-side check for common syntax errors. This catches most malformed addresses quickly.
2. Perform DNS MX Record Lookup
On the server-side, after basic regex validation, perform a DNS lookup for MX records associated with the email's domain. If no MX records exist, the domain cannot receive email, indicating an invalid address.
3. Send a Confirmation Email
For critical applications or user registrations, send an email to the provided address containing a unique verification link. This is the most reliable way to confirm both the validity and deliverability of an email address.