Attemping to add a value to a HashSet doesn't change the amount of values in it

Learn attemping to add a value to a hashset doesn't change the amount of values in it with practical examples, diagrams, and best practices. Covers c#, hashset development techniques with visual ex...

Why isn't my C# HashSet growing? Understanding Equality and Hashing

Hero image for Attemping to add a value to a HashSet doesn't change the amount of values in it

Explore common pitfalls when adding custom objects to a C# HashSet and learn how to correctly implement Equals() and GetHashCode() to ensure proper set behavior.

C#'s HashSet<T> is a powerful collection designed to store unique elements efficiently. It leverages hash codes and equality comparisons to quickly determine if an element already exists. However, developers often encounter a perplexing issue: attempting to add a seemingly new object to a HashSet doesn't increase its count. This usually happens when working with custom types and failing to properly define how these types are compared for equality and how their hash codes are generated. This article will delve into the mechanics of HashSet, explain why your custom objects might not be treated as unique, and provide clear guidance on how to fix it.

The Core Mechanism: How HashSet Determines Uniqueness

A HashSet<T> relies on two fundamental methods to manage its elements: Equals() and GetHashCode(). When you attempt to add an object to a HashSet, it performs the following steps:

  1. Calculate Hash Code: It first calls GetHashCode() on the object you're trying to add. This hash code determines which 'bucket' the object might belong to internally.
  2. Find Potential Matches: It then looks for existing objects in that same bucket.
  3. Perform Equality Check: For each object found in the bucket, it calls Equals() to compare the new object with the existing one. If Equals() returns true for any existing object, the HashSet considers the new object a duplicate and does not add it.

If GetHashCode() and Equals() are not overridden for custom types, they default to the implementations inherited from System.Object. The default GetHashCode() typically returns a unique hash code for each instance of an object, and the default Equals() performs a reference equality check (i.e., it checks if two variables refer to the exact same object in memory). This behavior is often not what you want for custom types where two different instances might be considered 'equal' based on their values.

Hero image for Attemping to add a value to a HashSet doesn't change the amount of values in it

Flowchart of HashSet's Add Operation

The Problem: Default Equals() and GetHashCode() for Custom Types

Consider a simple Person class. If you create two Person objects with the same name and age but as separate instances, the default Equals() method will treat them as distinct because they occupy different memory locations. Consequently, HashSet will add both, even if logically they represent the same person.

Let's look at an example where this problem manifests.

public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }

    public Person(string name, int age)
    {
        Name = name;
        Age = age;
    }

    // No overridden Equals or GetHashCode
}

public class Program
{
    public static void Main(string[] args)
    {
        var people = new HashSet<Person>();

        var person1 = new Person("Alice", 30);
        var person2 = new Person("Bob", 25);
        var person3 = new Person("Alice", 30); // Logically same as person1

        people.Add(person1);
        people.Add(person2);
        people.Add(person3);

        Console.WriteLine($"HashSet count: {people.Count}"); // Expected: 2, Actual: 3

        // This will also return false, as it's a new instance
        Console.WriteLine($"Contains new Alice: {people.Contains(new Person("Alice", 30))}"); // Expected: true, Actual: false
    }
}

Demonstration of HashSet failing to recognize logical duplicates without custom equality.

The Solution: Overriding Equals() and GetHashCode()

To make HashSet correctly identify unique custom objects based on their values, you need to override both Equals() and GetHashCode() in your custom class. These methods should define what 'equality' means for your type.

Rules for Equals() and GetHashCode():

  1. Consistency: If two objects are equal according to Equals(), their GetHashCode() methods must return the same value.
  2. Symmetry: If a.Equals(b) is true, then b.Equals(a) must also be true.
  3. Transitivity: If a.Equals(b) is true and b.Equals(c) is true, then a.Equals(c) must also be true.
  4. Null Handling: x.Equals(null) must return false.
  5. Performance: GetHashCode() should be efficient and distribute hash codes evenly to minimize collisions.

Here's how to correctly implement these methods for our Person class:

using System;
using System.Collections.Generic;

public class Person : IEquatable<Person>
{
    public string Name { get; set; }
    public int Age { get; set; }

    public Person(string name, int age)
    {
        Name = name;
        Age = age;
    }

    // Override Equals for value-based equality
    public override bool Equals(object obj)
    {
        return Equals(obj as Person);
    }

    public bool Equals(Person other)
    {
        if (other == null) return false;
        if (ReferenceEquals(this, other)) return true;

        return Name == other.Name && Age == other.Age;
    }

    // Override GetHashCode to be consistent with Equals
    public override int GetHashCode()
    {
        // Combine hash codes of relevant properties
        // Using a tuple for combining hash codes is a common and good practice in C# 7+
        return HashCode.Combine(Name, Age);
    }

    // Optional: For better debugging/output
    public override string ToString()
    {
        return $"Person {{Name='{Name}', Age={Age}}}";
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        var people = new HashSet<Person>();

        var person1 = new Person("Alice", 30);
        var person2 = new Person("Bob", 25);
        var person3 = new Person("Alice", 30); // Logically same as person1
        var person4 = new Person("Charlie", 40);

        people.Add(person1);
        people.Add(person2);
        people.Add(person3);
        people.Add(person4);

        Console.WriteLine($"HashSet count: {people.Count}"); // Expected: 3, Actual: 3

        // This will now return true because Equals and GetHashCode are correctly implemented
        Console.WriteLine($"Contains new Alice: {people.Contains(new Person("Alice", 30))}"); // Expected: true, Actual: true

        foreach (var p in people)
        {
            Console.WriteLine(p);
        }
    }
}

Correct implementation of Equals and GetHashCode for the Person class.

Using IEqualityComparer<T> for External Equality

Sometimes, you might not have control over the class definition (e.g., it's from a third-party library), or you need different equality definitions for the same type in different contexts. In such cases, you can provide a custom IEqualityComparer<T> implementation to the HashSet constructor.

This approach allows you to define equality and hashing logic externally, without modifying the original class.

using System;
using System.Collections.Generic;

// Person class without custom Equals/GetHashCode
public class Person
{
    public string Name { get; set; }
    public int Age { get; set; }

    public Person(string name, int age)
    {
        Name = name;
        Age = age;
    }

    public override string ToString()
    {
        return $"Person {{Name='{Name}', Age={Age}}}";
    }
}

// Custom IEqualityComparer for Person
public class PersonComparer : IEqualityComparer<Person>
{
    public bool Equals(Person x, Person y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (ReferenceEquals(x, null) || ReferenceEquals(y, null)) return false;
        return x.Name == y.Name && x.Age == y.Age;
    }

    public int GetHashCode(Person obj)
    {
        if (obj == null) return 0;
        return HashCode.Combine(obj.Name, obj.Age);
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        // Pass the custom comparer to the HashSet constructor
        var people = new HashSet<Person>(new PersonComparer());

        var person1 = new Person("Alice", 30);
        var person2 = new Person("Bob", 25);
        var person3 = new Person("Alice", 30); // Logically same as person1

        people.Add(person1);
        people.Add(person2);
        people.Add(person3);

        Console.WriteLine($"HashSet count: {people.Count}"); // Expected: 2, Actual: 2

        Console.WriteLine($"Contains new Alice: {people.Contains(new Person("Alice", 30))}"); // Expected: true, Actual: true

        foreach (var p in people)
        {
            Console.WriteLine(p);
        }
    }
}

Using IEqualityComparer to define external equality for HashSet.