Anagram time complexity

Learn anagram time complexity with practical examples, diagrams, and best practices. Covers string, substring, time-complexity development techniques with visual explanations.

Understanding Anagram Time Complexity: A Deep Dive

Two word clouds, one with 'listen' and the other with 'silent', illustrating the concept of anagrams.

Explore various algorithms for checking if two strings are anagrams, analyze their time complexities, and learn how to optimize your solutions.

Anagrams are words or phrases formed by rearranging the letters of another, such as 'listen' and 'silent'. Determining if two strings are anagrams is a common problem in computer science interviews and competitive programming. While the concept is simple, the efficiency of checking for anagrams can vary significantly depending on the chosen algorithm. This article will break down different approaches, analyze their time complexities, and provide practical code examples.

What Defines an Anagram?

Before diving into algorithms, let's clearly define what constitutes an anagram. Two strings are considered anagrams if:

They contain the same characters.
They contain the same count of each character.
They have the same length.

Case sensitivity and whitespace handling are crucial considerations. For simplicity, we'll assume strings are lowercase and contain only alphabetic characters, ignoring whitespace and punctuation unless specified.

Method 1: Sorting Characters

One of the most intuitive ways to check for anagrams is to sort both strings alphabetically and then compare them. If the sorted versions are identical, the original strings are anagrams. This method inherently handles character counts and ensures all characters are present.

def are_anagrams_sort(s1, s2):
    if len(s1) != len(s2):
        return False
    return sorted(s1) == sorted(s2)

# Example usage:
print(are_anagrams_sort("listen", "silent")) # True
print(are_anagrams_sort("hello", "world"))  # False

Python implementation of the sorting method for anagram detection.

flowchart TD
    A[Start] --> B{Check Lengths Equal?}
    B -- No --> C[Return False]
    B -- Yes --> D[Sort String 1]
    D --> E[Sort String 2]
    E --> F{Compare Sorted Strings?}
    F -- Yes --> G[Return True]
    F -- No --> C
    C --> H[End]
    G --> H

Flowchart for the sorting-based anagram detection algorithm.

Time Complexity Analysis (Sorting Method)

Let n be the length of the strings.

Length Check: O(1)
Sorting: The dominant operation is sorting each string. Standard sorting algorithms like Timsort (used in Python) or Merge Sort have a time complexity of O(n log n).
Comparison: Comparing two sorted strings takes O(n) time.

Therefore, the overall time complexity for the sorting method is O(n log n), primarily due to the sorting step. The space complexity is O(n) if the sorting algorithm requires auxiliary space (e.g., Merge Sort) or if new sorted strings are created.

Method 2: Character Counting (Frequency Array/Hash Map)

A more efficient approach, especially for strings with a limited character set (like ASCII or Unicode), involves counting the frequency of each character. We can use an array (for fixed character sets) or a hash map (for broader character sets) to store counts. The idea is to increment counts for characters in the first string and decrement for characters in the second. If all counts are zero at the end, the strings are anagrams.

from collections import defaultdict

def are_anagrams_count(s1, s2):
    if len(s1) != len(s2):
        return False

    char_counts = defaultdict(int)

    for char in s1:
        char_counts[char] += 1

    for char in s2:
        char_counts[char] -= 1
        if char_counts[char] < 0: # Optimization: if count goes negative, it's not an anagram
            return False

    # After iterating through both strings, all counts should be zero
    # This check is technically redundant if the negative check is present and lengths are equal
    # for count in char_counts.values():
    #     if count != 0:
    #         return False

    return True

# Example usage:
print(are_anagrams_count("anagram", "nagaram")) # True
print(are_anagrams_count("rat", "car"))     # False

Python implementation of the character counting method using a defaultdict.

flowchart TD
    A[Start] --> B{Check Lengths Equal?}
    B -- No --> C[Return False]
    B -- Yes --> D[Initialize Character Count Map]
    D --> E{For each char in String 1}
    E --> F[Increment char count]
    F --> E
    E --> G{For each char in String 2}
    G --> H[Decrement char count]
    H --> I{Is count < 0?}
    I -- Yes --> C
    I -- No --> G
    G --> J{All counts zero?}
    J -- No --> C
    J -- Yes --> K[Return True]
    C --> L[End]
    K --> L

Flowchart for the character counting-based anagram detection algorithm.

Time Complexity Analysis (Character Counting Method)

Let n be the length of the strings and k be the size of the character set (e.g., 26 for lowercase English letters, 256 for ASCII).

Length Check: O(1)
First Loop (String 1): Iterating through s1 and updating counts takes O(n) time. Hash map operations (insertion/update) average O(1).
Second Loop (String 2): Iterating through s2 and updating counts takes O(n) time. Hash map operations average O(1).
Final Check (Optional): If you iterate through the map to ensure all counts are zero, it takes O(k) time.

Therefore, the overall time complexity for the character counting method is O(n + k). Since k is often a constant (e.g., 256), this simplifies to O(n), making it generally faster than the sorting method for large strings. The space complexity is O(k) to store the character counts.

💡

For very large strings or scenarios where memory is extremely constrained, the sorting method might be less memory-intensive if an in-place sort is used and string copies are avoided. However, for typical string lengths and character sets, the counting method is usually preferred for its superior time complexity.

Edge Cases and Considerations

When implementing anagram checks, consider the following:

Case Sensitivity: Should 'Listen' and 'silent' be considered anagrams? Typically, strings are converted to a common case (e.g., lowercase) before comparison.
Whitespace and Punctuation: Should 'a gentleman' and 'elegant man' be anagrams? Often, non-alphabetic characters are removed or ignored.
Empty Strings: Two empty strings are anagrams. An empty string and a non-empty string are not.
Unicode Characters: If dealing with a wide range of characters, a hash map is more appropriate than a fixed-size array for character counts.

Python (Case-Insensitive, Ignore Spaces)

from collections import defaultdict

def are_anagrams_robust(s1, s2): s1 = ''.join(filter(str.isalpha, s1)).lower() s2 = ''.join(filter(str.isalpha, s2)).lower()

if len(s1) != len(s2):
    return False

char_counts = defaultdict(int)
for char in s1:
    char_counts[char] += 1

for char in s2:
    char_counts[char] -= 1
    if char_counts[char] < 0:
        return False
return True

print(are_anagrams_robust("A gentleman", "Elegant man")) # True print(are_anagrams_robust("Listen!", "Silent.")) # True

JavaScript (Case-Insensitive, Ignore Spaces)

function areAnagramsRobust(s1, s2) { s1 = s1.replace(/[^a-zA-Z]/g, '').toLowerCase(); s2 = s2.replace(/[^a-zA-Z]/g, '').toLowerCase();

if (s1.length !== s2.length) {
    return false;
}

const charCounts = {};

for (let char of s1) {
    charCounts[char] = (charCounts[char] || 0) + 1;
}

for (let char of s2) {
    if (!charCounts[char]) {
        return false; // Character not found or count already zero
    }
    charCounts[char]--;
}

// No need for a final check if the above logic is sound and lengths are equal
return true;

}

console.log(areAnagramsRobust("A gentleman", "Elegant man")); // true console.log(areAnagramsRobust("Listen!", "Silent.")); // true