What characters are allowed in an email address?

Learn what characters are allowed in an email address? with practical examples, diagrams, and best practices. Covers forms, email, email-validation development techniques with visual explanations.

Understanding Valid Email Address Characters and Validation

An illustration of an email envelope with various special characters floating around it, representing the complexity of email address validation.

Explore the complex rules governing email address characters, common validation pitfalls, and best practices for robust email handling in your applications.

Email addresses are fundamental to modern communication, yet their structure often leads to confusion, especially when it comes to validation. What characters are actually allowed? The answer is more nuanced than a simple regular expression. This article delves into the specifications defined by RFCs (Request for Comments), common misconceptions, and practical approaches to validating email addresses effectively.

The RFCs: Defining Email Address Structure

The definitive rules for email address syntax are primarily laid out in RFC 5322 (Internet Message Format) and RFC 5321 (Simple Mail Transfer Protocol). These documents describe an email address as having two main parts: a local-part and a domain, separated by an @ symbol. While the domain part is relatively straightforward (following DNS hostname rules), the local-part is where most of the complexity lies.

flowchart LR
    A[Email Address] --> B[Local Part]
    A --> C[Domain Part]
    B --> D["Allowed Characters (RFC 5322)"]
    C --> E["Hostname Rules (RFC 1035, 1123)"]
    D --> F["Alphanumeric, '.', '!', '#', '$', '%', '&', ''', '*', '+', '-', '/', '=', '?', '^', '_', '`', '{', '|', '}', '~'"]
    D --> G["Quoted Strings (RFC 5322)"]
    G --> H["Any character (except CR/LF, backslash, double quote) when quoted"]
    E --> I["Alphanumeric, '-', '.'"]
    E --> J["No leading/trailing hyphen or dot"]
    E --> K["TLD must not be all-numeric"]
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#ccf,stroke:#333,stroke-width:2px
    style C fill:#ccf,stroke:#333,stroke-width:2px

High-level structure and character rules for email addresses based on RFCs.

Local Part: The Wild West of Characters

The local-part (everything before the @) is surprisingly permissive. According to RFC 5322, it can consist of one or more 'dot-atom' or a 'quoted-string'.

Dot-atom characters: These include uppercase and lowercase English letters (A-Z, a-z), digits (0-9), and a selection of special characters: ! # $ % & ' * + - / = ? ^ _ { | } ~. A dot (.`) is also allowed, but it cannot be the first or last character, nor can two dots appear consecutively.

Quoted-string characters: If the local-part is enclosed in double quotes (e.g., "John Doe"@example.com), almost any character is allowed, including spaces, provided they are properly escaped (e.g., \" for a double quote within the string). This is rarely seen in practice but is technically valid.

⚠️

While RFCs allow a wide range of characters in the local-part, many email providers and systems impose stricter rules. Relying solely on RFC compliance for validation can lead to accepting addresses that are rejected by common services.

Domain Part: Stricter Rules

The domain part (everything after the @) is much more restrictive, adhering to DNS hostname conventions. It consists of one or more 'labels' separated by dots. Each label must start and end with an alphanumeric character and can contain alphanumeric characters or hyphens (-).

Key restrictions for the domain part:

Labels cannot start or end with a hyphen.
The total length of the domain name cannot exceed 255 characters.
The top-level domain (TLD) cannot be all-numeric (e.g., example@123.123 is invalid, but example@domain.123 is valid if 123 is a valid TLD).
Internationalized Domain Names (IDNs) are supported through Punycode encoding.

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

A commonly used, but still imperfect, regular expression for email validation. It's a balance between RFC compliance and practical usage.

Practical Email Validation Strategies

Given the complexity and the discrepancy between RFCs and real-world implementations, a multi-faceted approach to email validation is often best:

Basic Regex Check: Use a robust, but not overly strict, regular expression to catch obvious syntax errors. The regex above is a good starting point.
Server-Side Validation: Always re-validate on the server. Client-side validation is for user experience, not security.
DNS Lookup (MX Record Check): Verify that the domain actually exists and has Mail Exchange (MX) records. This doesn't guarantee the address is deliverable but filters out many invalid domains.
SMTP Verification (Carefully): Attempting to connect to the mail server and verifying the address (without sending an email) can be very effective but can also be slow, resource-intensive, and sometimes blocked by mail servers.
Confirmation Email: The most reliable method is to send a confirmation email. If the user receives it and clicks a link, the address is valid and deliverable.

Avoid overly strict regex patterns that might reject valid email addresses used by some providers. Focus on deliverability rather than strict RFC compliance for the local-part.

💡

For most applications, a combination of a reasonable regex and sending a confirmation email provides the best balance of user experience and validation accuracy. Overly complex regex patterns often break more than they fix.

1. Implement Basic Regex Validation

Use a well-tested regular expression (like the one provided) to perform an initial client-side and server-side check for common syntax errors. This catches most malformed addresses quickly.

2. Perform DNS MX Record Lookup

On the server-side, after basic regex validation, perform a DNS lookup for MX records associated with the email's domain. If no MX records exist, the domain cannot receive email, indicating an invalid address.

3. Send a Confirmation Email

For critical applications or user registrations, send an email to the provided address containing a unique verification link. This is the most reliable way to confirm both the validity and deliverability of an email address.

What characters are allowed in an email address?

Tags:

Categories:

Understanding Valid Email Address Characters and Validation

The RFCs: Defining Email Address Structure

Local Part: The Wild West of Characters

Domain Part: Stricter Rules

Practical Email Validation Strategies

1. Implement Basic Regex Validation

2. Perform DNS MX Record Lookup

3. Send a Confirmation Email